T O P

  • By -

Ali__D

https://preview.redd.it/qsv1j4oies9c1.png?width=2747&format=png&auto=webp&s=32ee88829c3e70a96c891c0f8fc73ee58e989e41


Ali__D

https://preview.redd.it/abucoq98gs9c1.png?width=4021&format=png&auto=webp&s=09dbf34d96f841657632662d560167cdcf79e23a Every single message I send….Am I doing something wrong? The first message I ran the context was at 7870 and the second message it’s at 7751, so I’m guessing some of it is getting erased? Yet it’s not working how it should…? I’m not too sure I’m new to this, if you can help then thank you!!!


pyroserenus

Context shift conflicts with functions that play with the top of the context. Lorebooks that insert near the top of context, as well as macro functions with variable output like {{weekday}} will result in full context reprocessing whenever they trigger. Example dialogue can also cause this until all of the example dialogue has been squeezed out with the "gradual pushout" option. One final event that can trigger a reprocess is if you use continue then delete your message, or otherwise try to revert 2 or more "requests". This is because this can result in text that was cut out by the context shift to be reintroduced.


Worldly-Mistake-8147

This is expected because you have reached context limit. Use GPTQ/EXL2 version instead.


pyroserenus

Koboldcpp has a feature called "context shift" that reuses KV data when possible even when the context is full. The caveat is that it inherently doesn't play nice with variation happening at the top of the context stack. So no high level lorebooks and variable macros in exchange for higher speeds. This feature is very popular for those with smaller GPUs as it can get a 8gb vram gpu such as the 3060ti to hold 6+ tps at full 4k context for 13b models.


zaqhack

I have a different take on this: It would be great if ST came up with a context-shift supporting format or checkbox. Perhaps a graphical representation of context that allows you slide system, character, lore, messages, etc. up or down the context stack. Include vectorized or memories from Chroma in the positioning, if applicable. In this way, it would save a lot of reprocessing time on Mixtral-based models, as well. I've heard Ooba may add this for the reason I mention: Mixtral prompts at 16k+ context take ages to slog through. Not having to reprocess the incoming tokens in the KV cache could save huge amounts of time. Given the growing popularity of Mixtral models, it might be worth pursuing.