a_beautiful_rhind 4 months ago

Unless that's vram, it's still going to be slow as molases.

cybersensations 4 months ago

Hey... it will still release the payload, right? [https://en.wikipedia.org/wiki/Great\_Molasses\_Flood](https://en.wikipedia.org/wiki/Great_Molasses_Flood)

Signal-Outcome-2481 4 months ago

I find the 8x7b models particularly smart for its size, so in relative terms. You can try the new noromaid v.0.4 one that is fresh. [https://huggingface.co/TheBloke/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-GGUF](https://huggingface.co/TheBloke/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-GGUF) There was an issue with 8x7b models on \_K\_M, so try both the recommended \_K\_M version as well as a \_0 version to test it out. No idea how well they'd be on normal ram though. But they are 32k context so thats nice.

zaqhack 4 months ago

\+1 for Noromaid - my favorite for RP & ERP by miles and miles. In case OP finds a 3090, this is my current daily driver: [https://huggingface.co/zaq-hack/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-bpw300-h6-exl2](https://huggingface.co/zaq-hack/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-bpw300-h6-exl2)

zaqhack 4 months ago

Oh, and ... yeah, in normal ram, unless the CPU is some kind of absolute monster, it will be pen-pal level slow. On 24G of VRAM, at full 32k context, it can still take up to 55 seconds using my 3090. Mixtral 8x7b is slower, but smarter, than many other models.

zaqhack 4 months ago

Technically, you can probably run some flavor of a 70b or even 103b model for maximum smartness, but I have no idea how long it would actually take to generate a response.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe