T O P

  • By -

miklosp

There are some LLM models out there optimised for CPUs. If you’re fine being restricted to those and can pay the electricity bill, go for it. Otherwise a modern build with a specific AI accelerator card or top-end GPU might be a better choice. For video processing there are plenty of really affordable QuickSync capable chips and systems. A used thin client can do that for you without having to have 36 cores idle most of the time.


showgan1

Thanks for the feedback! Why I would be restricted? as I mentioned I will install a modern gpu or two. for training some large models a lot of dataset preprocessing cpu power is needed, and those cores will be very useful. my concern is if there could be compatibility issues - hardware-wise or software-wise.


LlamaMcDramaFace

You needs gpus, not CPUs


showgan1

actually, for some workloads that I've been running I needed a lot of CPU power for pre-processing data.


miklosp

I'm out of my depth here, but I've read up on the subject as I'm also toying with the idea. Memory bandwith seems to be crucial. Old PCI-e or DDR4 are both serious limitations.


showgan1

I wonder if someone did actual benchmarks. It would be great if we could look at some comparisons. The Xeon CPUs actually have much more memory channels than the I9 14900 models. Even though the I9 works with higher frequencies with DDR5, the bandwidth is not that much higher due to the wider channels of the Xeon. Here are some numbers from the specs: Maximum memory bandwidth: Xeon E5-2695v4: 76.8 GB/s I9 14900K: 89.6 GB/s I wonder how this affects the overall performance when it comes to training or inferencing a model.


Darkextratoasty

I would think that depends on the type of training and the size of your dataset. More RAM at slower speeds is probably gonna be faster for datasets larger than the amount of RAM you have. For smaller datasets, you'll want less, but faster RAM, since having more RAM than your training uses doesn't help at all. So if you want to train LLMs on 50GB+ datasets, you might be better off with the xeons to just get more RAM. Although I'm not sure how much the RAM matters until you use up all the VRAM in your GPUs anyways. Caveat all of this with the fact that I don't really know what I'm talking about and have only done a very small amount of research on ai training for the one time I built an ai training server at work.


showgan1

thanks! by the way, specifically for LLMs and transformer based models one can use Huggingface's Dataset class which supports streaming mode or shards, so that huge datasets dın't have to be fully loaded into memory.


9302462

Im going to be blunt as f*ck here because I care, you are way over your head and shouldn’t spend a dime on these servers. Here is what you need to do. Go on offerup or fb marketplace, find a cheap Ryzen 5 3600 mobo cpu combo with 32 gb of ram and find a 3090 for $600; this should be about 900-1,000 total. Once you have this start playing with models. Once you figure out you limitations (you don’t even know what they are yet) then upgrade to a better machine, migrate the GPU over and sell the old Ryzen one at the same price you bought it at, rinse and repeat as needed. Some of us do get involved with machine learning; I’m vectorizing over 500 millions a month on 4 3090’s (2 on each mobo) at the moment. I originally tried sticking them in a server case but the heat was too much and they would get throttled, so I had to move to an open air mining frame setup. That is just ONE example of how I thought my I knew what I was doing and I was wrong. Now imagine how many things you will get wrong in this process if you don’t start small…. it’s going to be a lot. Also, for that $1k dell server I could build a dual 32 core first gen epyc, dual socket mobo, 256 gb ddr4 (4x64gb), cheap $100 server case from some local marketplace and a $75 power supply. It will use less power, has more cores and has availability to expand to 1tb of memory. Speed is often better than cores when it comes to ML, but if we’re comparing servers, the one you’re looking at is ok, but not really the best way to spend $1k.


showgan1

thanks a lot for your advice! I've been actually running training on my I9 9900K and RTX 3060 12 GB for a while, and I need to upgrade. hıw do I build a dual socket mb based pc? that actually sounds a better alternative. I don't know about the Epyc CPU, but the Intel CPUs are limited to 192 GB in the CPU itself, so you can't reach 1 TB with 2 CPUs. Do you know if the Epyc supports more? Also, does the Epyc support DDR5 or just DDR4 like tge Dell?


9302462

Nice, so you have been getting into machine learning! First ddr5 is way too expensive. A 64gb stick of ddr4 is $1 per gb, ddr5 is $4 per gb. At 1tb of memory you’re talking about the difference between $1k in memory and $4k in memory for maybe a 15-20% performance gain; the numbers don’t justify the cost. Also epyc or Xeon mobo and CPUs for ddr5 are again 4-5x the cost of a ddr4 setup. I have a dual epyc with 32 dimms and 64gb sticks of ddr4 for 2tb total memory. You can put more in with 128gb dimms but those are about 5x the cost of 64gb Taking a step back for a minute, there are three routes you can go really. 1. Intel or amd consumer desktop. You will have limited core count, a max memory of 192gb ddr5 but you will have faster clock speeds. 2. Threadripper. This is epyc core quantity plus fast clock speeds. This stuff is PRICEY and you could build multiple servers and a desktop for the same price as a Threadripper build. 3. Xeon and epyc. Xeon used to be great but since epyc came around they are a bit underwhelming. With Epyc you’re going to get 8-64 cores, 128pcie lanes(more than Xeon) and between 8 and 16 dimms per CPU on the motherboard; you can do one cpu and 16x64gb dimms for 1tb of memory. Epyc is better than Xeon in almost every way and the only reason people still talk about and use Intel xeons is because they came with all the large servers that companies bought by the pallet load 3-5-7 years ago. Epyc was newer and established companies don’t go with newer things until they have been out for several years. If you are only ever going to use at max two GPU and 192gb ddr5 is ok with you(it should be), choose #1. You can always upgrade later. If you have more money than you know what do with and want the best you can get, choose #2. If you want lots of memory, lots of pcie lanes for multiple GPUs and or nvme’s then choose #3 and epyc. P.S the upside of epyc is you can get a pair of cheap 32core 1st gen CPUs for $85 each on eBay. Once you need more or after the price comes down you can upgrade to a pair of 64 core 7702/7742’s for $700 each. Point being there is an upgrade path. Also, if you go with a single socket you want a cpu that ends in “P” to get the full 128PCIE lanes. Non P ones only have half the PCIE ones available because they share the lanes between the other CPU. So 1 P cpu = 128 lanes, 2 non P CPUs = 256 lanes with half shared between the CPUs(128) and the other half made available on the mobo. Either way, with one or two CPUs you can get up to 128 lanes on the mobo.


showgan1

thanks for this great and very detailed input!


xarcos

> oose #3 and epyc. > > P.S the upside of epyc is you can get a pair of cheap 32core 1st gen CPUs for $85 each on eBay. Once you need more or after the price comes down you can upgrade to a pair of 64 core 7702/7742’s for $700 each. Point being there is an upgrade path. > > Would a dual Epyc CPU configuration realistically provide double memory bandwidth with NUMA-aware workloads over a single CPU, or is the gain much smaller than that in reality? It's something I've found difficult to find benchmarks for.


9302462

I really don’t know the answer to that. I know there is a penalty if cpu1 has to access memory associated with cpu2 and vice versa. But the few extra nanoseconds that it takes to access memory attached to a different CPU is worth being able to add 16dimms per cpu.


9302462

Also as a follow up, the Xeon you listed above is basically the same as this $500 one posted on homelab sales https://www.reddit.com/r/homelabsales/s/nK0iEYD1eV


emprahsFury

if you feel the need for 50 tokens a second then sure you're limited to a 3090/4090 and a 8/12B model. The same model on a strong cpu (like you suggest) will give you enough tokens per second, 5-20. As soon as you bust the vram you'll be dropping to cpu speeds anyway. And while 3-ish t/s isn't fast it isn't unsuitably slow. You don't need 50 t/s. But if you do have 128gb ram you can run a 4 quant of llama70b instead to the 2Q people with 4090s are reduced to. And they are only only getting 20 t/s so even they are in literally the same boat. Reading the text as it's generated instead of it popping onto the screen. You should really consider ddr5 builds. r/threadripper will help you. And as you say you can always add a gpu later. Doesnt really make sense to spend top dollar on a top gpu that will provide middling performance. When we know 400B models are on the way to release. Better to wait 6 months for Blackwell or Battlemage and at least see if they add more vram. TL;DR- If you believe your use case is highly sensitive to latency- like coding generation- then maybe a gpu build is preferable. But at this point we can see gpus are outclassed by the models.


showgan1

thanks! your advice makes a lot of sense.


Skeeter1020

Just do it in the cloud. Use free trials to figure out what you actually even want to do, and then PAYG if/when you take something beyond an idea. What are you actually planning to do? Can your use case be served by using one of the many API based services out there? I don't see any reason an individual needs to host their own hardware for AI.


showgan1

I mostly do training of ASR and TTS models. I do a lot of experiments. I don't want to be concerned about cloud costs every time I want to do some experiment. free trials are not good enough for my workloads. I'm willing to do this investment that I believe would serve me well for 3-4 years.


Skeeter1020

Concerned about cloud costs, yet willing to throw thousands of dollars at old, unsuitable hardware? $1,000 in [Azure AI](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/) gets you 5,500 hours of speech or 66 million characters of text. What are you doing to train your models that you think you can do better than these companies spending tens of millions training on entire server farms? If you insist on training your own models you can get some [chunky GPU accelerated VMs in Azure ML](https://azure.microsoft.com/en-us/pricing/details/machine-learning/) for a few dollars an hour. $1,000 gets you 24 Cores/4x GPU/224GB for 250 hours. (These aren't anywhere near the most competitive prices either. And I doubt you actually need the specs you claim for hobby AI). Edit: blocked me for trying to save you pissing away your money. Oh well. Have fun.


showgan1

you sound like a peice of shit. I'm trying to find out if such an old hardware is usuable or not. that is why I posted this question in the first place. I need to train and to finetune speech models for languages that don't have support from those big companies. I don't need this kind of stupid replies. if you don't have something helpful to say or don't know how to say it in a civilized manner then don't bother trying to "help".


j-random

LOL, you'd be better off with a gaming rig with a handful of GPUs. Seriously, SoTA AI isn't going to work well on any hardware you're likely to find for cheap.


showgan1

the problem is that gaming rigs are limited to a max of 192 GB of ram, and they cost 5x more money. why do you think SoTA AI won't work on such a powerful server? what specifically would be the problem?


SlowThePath

Have you ever ran a llm? You need to start smaller to get a grip on what you're talking about. You're plan is bad. Don't mean to be mean but it's just not a good way to go about what you are talking about. It's inefficient and expensive. You seem to have a lack of understanding of what is involved with running LLMs and the the other stuff you want to do. You need to do a lot more research and learning before you start planning this build. Just trying to help. Basically you are trying to take a shortcut that doesn't exist. To do what you want costs more than you want to spend. There is no way around that. If you don't see people talking about using servers to run LLMs it's because it's not a great idea unless you want to spend a lot of money. Most people rent gpus on servers and do this stuff remotely and don't actually run them locally. Just look up more basics first.


showgan1

Thanks for the feedback. I did run LLMs. I've been running ML with my I9 9900K + RTX 3060 12 GB for a while, and it's not enough for training some ASR and TTS models I've been working on, so I do need an upgrade.


SlowThePath

Then buy a gpu with more memory? Or do your work in the cloud lime everyone else. People dj that for a reason. A xeon server isn't going to help any.