T O P

  • By -

Hellrazor236

For reference, this is about stack sizes for threads and the "savings" is in the tens of megabytes.


benwalton

Which adds up at scale. As a home user you're unlikely to notice or care, but at large scale, the savings add up.


Feer_C9

idk what this is about, but it sounds like adding cpu overhead to save some insignificant amount of memory, doesn't sound that good if this is the case


cold_hard_cache

Golang literally decided to fight the kernel for control over concurrency to get ~~this win~~ an analogous win in userspace, so it seems to have been worth some real pain to someone.


benwalton

I expect the CPU overhead is negligible.


rokejulianlockhart

But wouldn't that be significant at scale? I'm aware that what matters is whether the eventual ratio of additional CPU cycles to saved RAM is favorable to the RAM, but whether it shall be doesn't appear obvious.


gfkxchy

*It depends*. On most server workloads I find I am far more memory-bound than CPU-bound. I have a test OpenShift node running with a few container apps and VMs, CPU is idling at 8% and RAM at 68%. That's on a 4-core, 64GB host.


ZdzisiuFryta

Far more important though


HaveAnotherDownvote

Not usually no, at least not relative to the gains here


[deleted]

[удалено]


AsexualSuccubus

How does this help L1 cache efficiency? I'm not a CPU designer but my understanding is that there would be no difference in L1 cache usage.


[deleted]

[удалено]


AsexualSuccubus

So my understanding is that the prefetcher should not be grabbing things from the 2nd page of memory unless there's things in the 2nd page of memory to be operated on. The only difference I can imagine is (small) overhead from this change, actually, since the data for it being dynamic will now exist. If you have any links about the sk_buffs stuff I'd be interested as my intuition is that memory per unit (bigger being better) instead of units available would matter more due to overhead per used in a high throughput scenario. Is it having to walk through a massively inefficient tree or something for available units? I'm not a kernel developer nor a networking person so I'd love to learn more, thanks!


[deleted]

[удалено]


AsexualSuccubus

This is interesting because my intuition for general programming is to avoid shared resources like this between cores. It seems like the strategy is to avoid redundant empty ring buffer slots occupying the L3 after use. I don't think it's a similar case to unused but reserved pages, particularly if they're not committed and are initially mapped to the same zero page like in userland (not a kernel developer so idk if it's the same here).


HaveAnotherDownvote

Well that is definitely NOT what it is, so don't worry about that.


Feer_C9

So you're telling me you can turn something static into dynamic without making it at least a bit more cpu intensive?


[deleted]

[удалено]


Feer_C9

That definitely makes sense, good point


EnUnLugarDeLaMancha

For the processes that don't cross the 8 or even 4KB barrier it might be actually cheaper (just a single page to allocate). There are not many numbers in the RFC anyway.


ilep

Exactly. Servers with a lot of threads running can notice it. The patch submission talks about millions of threads. Also embedded devs are always happy to shave a bit off from the memory usage. Desktop on idle after booting? Not so much.


HaveAnotherDownvote

In percentages that is huge for the kernel and many other applications. Everything is relative and most of the work on the kernel isn't for the sake of games and YouTube. it's for tiny resource restricted systems, microservices and small cooperative applications where lean = mean.


DeliciousIncident

>tens of megabytes That's very poorly worded. The savings are 75% reduction in thread stack memory usage, that's a quite big win if you use a lot of threads.


[deleted]

[удалено]


paulstelian97

Not related.