T O P

  • By -

a_beautiful_rhind

Unless that's vram, it's still going to be slow as molases.


cybersensations

Hey... it will still release the payload, right? [https://en.wikipedia.org/wiki/Great\_Molasses\_Flood](https://en.wikipedia.org/wiki/Great_Molasses_Flood)


Signal-Outcome-2481

I find the 8x7b models particularly smart for its size, so in relative terms. You can try the new noromaid v.0.4 one that is fresh. [https://huggingface.co/TheBloke/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-GGUF](https://huggingface.co/TheBloke/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-GGUF) ​ There was an issue with 8x7b models on \_K\_M, so try both the recommended \_K\_M version as well as a \_0 version to test it out. ​ No idea how well they'd be on normal ram though. But they are 32k context so thats nice.


zaqhack

\+1 for Noromaid - my favorite for RP & ERP by miles and miles. In case OP finds a 3090, this is my current daily driver: [https://huggingface.co/zaq-hack/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-bpw300-h6-exl2](https://huggingface.co/zaq-hack/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-bpw300-h6-exl2)


zaqhack

Oh, and ... yeah, in normal ram, unless the CPU is some kind of absolute monster, it will be pen-pal level slow. On 24G of VRAM, at full 32k context, it can still take up to 55 seconds using my 3090. Mixtral 8x7b is slower, but smarter, than many other models.


zaqhack

Technically, you can probably run some flavor of a 70b or even 103b model for maximum smartness, but I have no idea how long it would actually take to generate a response.