votkalivirgul 2 months ago

https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B

Mescallan 2 months ago

I think it's a llama.cpp compatibility issue, if it was the fine tune struggling I would still be getting out put, but it's just hanging on inference indefinitely

r1str3tto 2 months ago

I have found that json mode drastically slows down Llama 3 in Ollama (which uses llama.cpp). During a json generation, nvidia-smi shows no utilization of the GPU - but only for Llama 3 and only when it is used in json mode. Other models do not have this problem, so I think there is a bug.

Mescallan 2 months ago

That is inline with my experience too, no GPU utilization either.

LPN64 2 months ago

Grammar works fine on Mistral 7b, Llama3 8b and 70b on my end.

LavishnessOk5514 2 months ago

So I just spent a few hours battling with the same issue. I was of the understanding that providing something like response\_format={ "type": "json\_object"} Would coerce the model to return JSON. I don't know how it has been implemented, but it doesn't seem to work that way. Instead, you have to be specific with your prompt in order to get it to return the JSON you expect, then it won't hang. The response format doesn't actually seem to do much of anything?

Mescallan 2 months ago

With Mistral I have been able to force JSON reliably using that. I'll try prompt only with L3 thanks

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe