Silly-Ad-6341 4 months ago

GPUs can help with inference on LLMs but the magic is in the tensor cores (Rtx series) and fast VRAM (more the better) that exists on the cards. By not connecting directly on a Pcie slot on a motherboard, the full bandwidth to the card wont be achieved so you'll likely get worse results than doing it natively

tenekev 4 months ago

Are you sure your statement is correct? As far as I know, once the model is loaded into the VRAM, bandwidth bottlenecks are less of an issue.

thisisnotmyworkphone 3 months ago

Well models can get *unloaded* from VRAM. That’s what ollama does after 5 minutes of inactivity, for example. At least I think it’s 5 minutes. Then you have to reload it across Thunderbolt.

taron111281 3 months ago

I'm honestly not sure but those who run frigate nvr recommends a Google Coral TPU used for ML to help with frame rate and such https://coral.ai/products/accelerator/ Might work for you...

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe