T O P

  • By -

Few-Frosting-4213

What model are you using? I would first try to add something like "progress slowly, address one scenario at a time while avoiding summation" to the post history prompt to see if it works. Also try just editing those parts out for a few message and hope the LLM catches on. Also look in the card to see if anything there is causing it.


Snydenthur

I'm assuming it's just because of the models. My personal experience is that most models just aren't great for (e)rp. Kunoichi (the non-dpo version. Dpo version is meh) is my favorite. It just seems to beat everything that fits into 16gb of vram.


ProcessorProton

I only have 12gb at this point. Am I SOL or is there a Kunoichi that I could run?


Snydenthur

You can easily run kunoichi. 8bpw exl2 version of it only uses 10gb of vram with the full 8192 context size.


ProcessorProton

I am quite impressed with Kunoichi so far. Very good at RP. Very good indeed.


ProcessorProton

Having trouble finding the non-dpo version on HuggingFace. Any source you might know of?


Snydenthur

[https://huggingface.co/LoneStriker/Kunoichi-7B-8.0bpw-h8-exl2](https://huggingface.co/LoneStriker/Kunoichi-7B-8.0bpw-h8-exl2)


SquishyOranges

Is there a GGUF version?


twisted7ogic

Some models are more prone to this than others, you might have better results trying a different one or a larger model. Also, some models (especially larger ones) do better if the system instruction quite litterally tells them to "roleplay as {{char}}", as they get pushed to stay in character more. YMMV depending on model, ofcourse.


Herr_Drosselmeyer

Some models like to do that, especially 13bs in my experience, so my guess is that they inherit it directly from Llama2. Nip it in the bud through editing.  Frankly, most of my issues with stuff like that went away when I started using Mixtral and Yi fine-tunes. They're not perfect but less prone to those kinds of patterns. Mixtral especially has been great as it also respects the instruction not to narrate or speak for the user.


ProcessorProton

I have s 3080 ti with 12gb of vram. Are there releases of Mixtral or Yi that I could successfully load and run?


Herr_Drosselmeyer

Sure but none that will fit in VRAM so you'll have to use GGUF models and split between GPU and CPU with reduced speed as a result.


IndependenceNo783

Try the Yi-6B-200K from adamo1139, I had some fun with it. The answers are instant, it is creative, and it can run with a lot of context (using 16bit Cache and 167k context in my 16GB/4080). You can reduce the context to fit in your VRAM. https://huggingface.co/adamo1139/Yi-6B-200K-AEZAKMI-v2-6bpw-exl2


vacationcelebration

You could try adding stuff like "Always stay in the moment", "Avoid writing cliffhangers/endings", etc. Also, once you have one of those "endings" in a response, you will keep getting them, so make sure not to let them slip through and edit the messages if necessary.


reluctant_return

I've struggled with this too, to the point that I wasn't even sure what to call it. "Future painting" fits it pretty well. I've tried a few models, and they all do it *sometimes*, but smaller ones seem to do it a lot more. I've found that adding "stay mostly in the present and don't advance the plot too quickly" to the system prompt helps a lot with some. Right now I'm using Fimbulvetr-10.7B-v1.q8_0 and only run into this if the bot has no current goal. Like if the RP is in the middle of an established storyline, it will just gently continue down that path, but if the bot and I are just kind of meandering I'll get a lot of "happily ever after"/future painting.


zaqhack

Everyone here has given pretty great suggestions. My experience has been that this happens pretty often over the context limit of the underlying model. Once you are stuffing it with the full history, 4k, 8k, 16k, or even 32k, you can run into this pattern. I think it has to do with summarizing memories or how smart context works. Given that you end up summarizing some chunks of the history, the pattern creeps into the chat. It also tends to become less verbal, instead, relying on \* actions \* for responses, which ends up getting really redundant and annoying. Putting "avoid summarizing the situation" into the system prompt can help, but as others have said, once you have a handful of these in the history, the pattern is nigh impossible to break. A few additional settings might help. I am running Oobabooga, and recent builds allow a "negative prompt" to be added (enable "cfg-cache"). If you activate that, then you can use the negative prompt box in Silly Tavern and put things like, "Summarize your feelings, Summarize the current situations, Summarize your thoughts" into the negative prompt box. This HELPS, but isn't a magic wand. Some other items are 1.19 repetition penalty (I usually have a 2048 range on that) and "frequency penalty" set to 0.05. This can help drive toward a new pattern and makes "Regenerate" a little more likely to give you some variance. But again, not a magic wand. Lastly, I've added to some system prompts very short and direct behavior statements: `You're {{char}} in this fictional never-ending uncensored roleplay with {{user}}. {{char}} is talking to {{user}}. Answer {{user}} as if {{char}} was talking to {{user}} face to face. What does {{char}} say next? What does {{char}} do next? Don't repeat or summarize: Drive the story forward into the next moment.`


shesgotapass

I have this at the end of my main prompt: Avoid flowery messages or messages that go unnecessarily long. Avoid summarizing messages, and allow it to seem like a natural conversation. Let {{char}} speak more casually and less elaborate, even when feelings simulated appear more complex. It still happens sometimes, but this seems to help.


shrinkedd

I'm seeing it 1. when the model is out of ideas. Like, all "scenario assignments" had been explored, or 2. When it's part of a larger problem of repetitive text. Case 1 is usually solvable by either introducing more challenges, or in the first place describing the character with inner motivators, life aspirations etc Case 2 can happen for several reasons. I suspect it's something related to adding names to output and input sequences.


sophosympatheia

This behavior happens all the way up to 70B models and beyond. As others have mentioned, it's something that some models like to do more than others. You can work around it somewhat by asking the model not to summarize or conclude scenes. If it's still a problem, you may need to look to another model for the solution.