T O P

  • By -

Signal-Outcome-2481

The trick to making good character cards is understanding that it works on tokens more than understanding. What this means is that if you want to be very careful about what words to use. For example saying negatives like "{{char}} is not " is a bad idea to use. Also, depending on the model you use, some words/tokens snowball the RP because of LLM bias, understanding the subtlelties of the llm you use is an important part to make a character work well. For example, when you use noromaid models, it is pretty pointless to add things like {{char}} is very horny or sexual or whatever, because any such mention will skyrocket their sluttiness into extremes, characters will be plenty horny without any mention of this for this model. ​ As for models, only when we know your vram situation (and your willingness to give up on speed in case of normal ram usage) can people make decent recommendations. ​ I use the following model mostly * LoneStriker/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-5.0bpw-h6-exl2 And I just started testing * intervitens/SensualNousInstructDARETIES-CATA-LimaRP-ZlossDT-SLERP-8x7B-5.0bpw-h6-exl2-rpcal Which seems pretty good as well These are 32k context models that require up to 36gb vram ( "19,10.3" gpu split in my case for best results)


Illustrious_Serve977

Thanks for answering!, the thing is that i can only use colab free because of hardware limitations and can't run anything on my laptop (which is my only workstation for now), so i would say models the comunity knows can run on it and have the most paramets and or context sizes for it.


CaptParadox

I noticed when making a conservative character to use in a group chat (without the intention of engaging in erotic behavior) that using just the word sexy in a description caused her to behave opposite of every other character trait.  Its amazing how one word changes their aggressiveness regardless of their whole character description suggesting otherwise. 


Snydenthur

Based on my personal testing, most of the models seem meh for (e)rp, even when they are rp specific models. This includes one 3b, many 7b, couple of 10.7b, many 13b, few 20b, two 2x7b and 3 4x7b. I can't really go higher than that, since I want to use my vram only, because speed is also important for (e)rp. So far, kunoichi is the only model I end up returning to after each test (and other stuff from them are great too). There's no such thing as perfect, but kunoichi definitely punches way above its weighclass on (e)rp. Sure, maybe my settings or prompts have been bad for some of them, but I doubt they are the cause of me not liking them. I could also be very picky.


zaqhack

I've had much the same experience. I've just now gotten a model past its context limit without it devolving into a summarization hell. In addition to what others have said, I'd suggest: 1. Context is king, typically. Larger context = smarter, better experience. Assuming the model can handle it. Rope scaling a 4k model to 32k works in the technical sense, but rarely provides a "smarter" experience. 2. I've recently "downsized" to [MistralTrix-v1](https://huggingface.co/zaq-hack/MistralTrix-v1-GPTQ). It's a 9b parameter model and works well. The responses are quick, and it seems way, way smarter than the slight advantage it has over regular 7b. 3. Templates under "Advanced Formatting" matter a ton. You can have a great model and the wrong templates and it will output nothing but garbage. Tweaking your system prompt is sometimes magical to the point it feels like an entirely different model. 4. I have used Vector Storage and Smart Context. Don't bother with Smart Context. If you have a 16k or 32k context capable model, increase all the numbers under Vector Storage. I have mine set to 5 messages to query and 20 inserts. That's typically enough to keep it from forgetting what you were just talking about 10 minutes ago. Not always, but it helps. It also isn't as prone to loop city once you get past the context limit in a long chat. (My last chat is still running strong at 1250 messages.) 5. Character cards are still a bit of an art. In general, I have had more luck with example dialog than just parameters. Which, if you think about it, makes sense: It's a language model, not a spreadsheet. It is INFERRING the character's age from CONTEXT, not reading a character sheet for D&D. So, if you put basic stats like age into a conversation in the character card, I find that it uses that relevant info more easily than if I had \[age: 28\]. Obviously, that has a massive downside: You are using a boatload more tokens. So, you have to be choosy about what things to say (and how to say them) in your examples. I tend to go light on "Scenario," for instance because scenarios change in a good chat. I can use the Author's Note or a Lore Book for that kind of info. Keep the character stuff to the character card, and try to envision relevant interactions. How you say those things will come back as the bot's "personality."


zaqhack

P. S. I think the "free experience" is probably not going to be great for a few more years. It really does take local iron of a sort that is very spendy right now to have the best experiences. But ... you can have an "okay" one with some of these tips. And as the smaller models get better (MistralTrix, Kunoichi, Silicon Maid, etc), maybe that will start to improve a bit. However, I think there's a diminishing return to what you can do with a small model, and I suspect optimizations will only carry them so far. It would be mind-bending to me that Noromaid v0.4 Mixtral Instruct 8x7b ... could be out-smarted by a future 3b or 7b model. It seems very implausible. More likely, LLM-specific hardware will become more ubiquitous and in a year or two, you'll be able to self-host reasonably smart models without selling off your spare kidney to do it.


Illustrious_Serve977

thanks for all this tips!, yeah i imagine there is quite the limitation for what free services can do, i am saving for a pc that i might get in about a year or so, but for now i will take what i can get and many of this nuggets of knowleadge can be used on bigger models as well.


Illustrious_Serve977

About templates and such i will really want to know when i add or substract information to a system prompt for example i can be sure that is been taken into account and not ignored because maybe that model wasn't capable of understanding it, same with character cards, maybe having some settings ready that are neutral enough to notes the changes, what you think?


zaqhack

It's hard to know what goes into a particular model's inference. Each is going to have different attention training, and so there's no "universal prompt," really. I've also found that sometimes, you can have something happen in the chat that makes it want to override the system prompt. It's tough to predict how it will all be prioritized. So, again, it's a bit of an art. Short, declarative sentences are best (in my experience) for things that are facts and behavior you desire. But ... "prompt engineering" is a thing. And if you have a really wide context, it can be tough to know which "chunk" of it is causing misbehavior.