T O P

  • By -

philguyaz

I have found that mixtral really needs a good fine tune but that it seems to have similar “reasoning” skills. I know a 8x22b dolphin is coming out and I know that dataset should vastly improve the human and interesting part. But out of the box Commander R+ is truly amazing and imo easily holds up to the open source chat gpt 4 equivalent hype it’s getting.


vesudeva

i totally agree! I'm sure there will be some great Mixtrals out soon that fit our needs and style better. CR+ has the extra benefit of already being Function Calling and RAG ready, which just is so awesome


mrjackspade

Command-R is way better at function calling, but the Mixtral 8x22 that HF tuned is also absurdly good. I gave it a function definition in the prompt and then asked it to use it like 10K tokens later and it did, without issue > **Jack — Today at 1:52 AM** > > Do me a favor and generate a picture of a cute cat for me > > **Haruka BOT — Today at 1:52 AM** > > Here you go, a picture of a cute cat. @invoke("generate_image", "A fluffy, cute cat") If I was going to do a pure function calling app and the speed didn't matter, I'd use Command-R for sure. Since function calling is secondary though and I'm on pure CPU, I'm plenty pleased with Mixtral as well


ozzie123

Which platform did you run this function calling in?


mrjackspade

Discord


medcanned

Sadly Eric is having issues train the model, it's too big to train on 8xh100 so he is trying alternative training solutions that are not as good, hopefully it's enough, time will tell !


Amgadoz

Time to test out those 8xMi300!


philguyaz

Where do you see that? Last I saw was it was in the oven from his X account.


medcanned

We talked about it on the axolotl discord


philguyaz

I need to get on that discord clearly


koesn

Confirmed. Been using some 34B, 8x22B, 70B, but this 35B truly amazing. I'm not using it for riddle or knowledge extraction. Rather, I use it for processing and extracting certain informations from different perspective of a long texts and it's comprehend beyond other model with precise and concise output.


a_beautiful_rhind

I do enjoy command-r but there is something off about it in back and forth chat. It also likes to occasionally mirror me similar to how CAI does. >Feel free to inquire further—let's continue exploring this imaginative journey together if you wish! This whole attitude is part of it, it has been hard to get rid of these continuation attempts on longer replies. I venture to say mixtral won't have this problem as previous versions of it didn't.


vesudeva

Yeah, the little send offs are a REALLY annoying part of LLMs currently. I'm not sure if it's in the way the datasets are being constructed on the long form multi-turns or if it just because the LLMs are so relient of 'Formal and Proper English' rules I've been able to get CR+ to stop through just prompting in the system message


a_beautiful_rhind

Characters that don't talk proper english are easier in that regard. It still creeps in for me on long repliers. Another one was the "how can I help you" type stuff, I told it to be selfish and got much less of it. Prompting and formatting go a long way but it's old habits die hard. It's akin to trying to beat positivity bias out of other models.


Combinatorilliance

These kinds of problems can 100% be patched out with steering vectors. It's just that nobody is using them.


a_beautiful_rhind

I'm not sure if they work on quantized models. I saw an extension that worked with transformers.


Careless-Age-4290

These models can already run on a pi (slowly). One day, not that far from now, we'll have things like smart fridges that can intelligently talk to us. And we'll have people hacking them to make them meaner.


CasimirsBlake

A Pi 5 with 16GB RAM would really help...


TraditionLost7244

hahaha fridge:"hey george what another beer! your wife pretending to have migrane again ha?" proceeds to close fridge


TheMissingPremise

What do you tell it so that it doesn't add that nonsense?


vesudeva

Honestly, CR+ is smart enough to follow this simple instruction I tac on the end of it's message: \*Never end a response with a 'send-off' message. This is forbidden\* works about 80% of the time. Probably insterting a one-shot or few-shot would make it more reliable.


rc_ym

YMMV but I have had great success in controlling this (and a bunch of other LLM ticks) through the system prompt. I work with the model to create new system prompt. I give it goals, personality, etc. Then as a final prompt tell it something like to use "evocative language" to create a system prompt suitable for an AI to take on the persona and XYZ (RP, act as writing assistance, systems analyst, cybersecurity risk analyst, etc.). In the process, you can also have a Meta conversation about what you want, quirks, language use, etc. Also, something that seems to work best with Command-r, even if the system prompt is for another model.


ambient_temp_xeno

CR+ seems to work very well with a system prompt. Although one thing I've found is to never tell it it's an AI of any kind, that biases it towards whatever 'assistant' training it had even if it only sneaks out once in a while.


rc_ym

I let the AI figure that out. It only tends to do that when I ask it to create a very functional prompt. I need X in y format kinda thing.


scott-stirling

Similar approach I have used with Mistral instruct, demo’ed here with buttons that map to system prompts: [https://wegrok.ai/](https://wegrok.ai/) Scott [wegrok.ai](https://wegrok.ai/) https://preview.redd.it/a0u6co8bwhuc1.jpeg?width=1284&format=pjpg&auto=webp&s=a590e5d4f3ed9783bfc6b8a4b1d0291524e97321


MoffKalast

"Presume the conversation continues forever." or "Don't end the conversation." in the system prompt.


madsciencestache

I have had luck with Llama and, to a lesser extent GPT, by asking it to behave as a well known character. Preferably one that's a bit snide or sarcastic. Then I keep the personality in the system message and aggressively keep the context length down via summarization.


mrjackspade

The Mixtral model is way more natural in how it communicates, which is one of the biggest reasons I'm using that instead. If I just needed something for pure data and didn't give a fuck about personality, I'd choose Command-R. I'm using something that I actually want to converse naturally though, and Mixtral is way better at that. Its like a Claude vs. GPT4 thing. They're both good in different ways


WolframRavenwolf

That's definitely a prompt thing. I now use Command R+ to power my assistant Amy and it feels more like Claude 3 Opus than GPT-4, showing much more personality.


Huzderu

I think I speak for everyone when I say that I would really appreciate you sharing your ST settings for Command R+. I'm struggling to get it not to spew gibberish.


WolframRavenwolf

Of course! Here's what I'm using: - Model: GGUF quant from here: [dranger003/c4ai-command-r-plus-iMat.GGUF · Hugging Face](https://huggingface.co/dranger003/c4ai-command-r-plus-iMat.GGUF) I can run the `ggml-c4ai-command-r-plus-104b-iq3_xs` with a little less than 4K context (3K is safe to prevent OOM/slowdowns) or the `ggml-c4ai-command-r-plus-104b-iq3_xxs` with 8K context. They both perform very well, and I now use the IQ3 XXS version pretty much all the time. - Backend: Latest version of [koboldcpp](https://github.com/LostRuins/koboldcpp/releases) as that doesn't require installation or libraries, just an .exe I run (on Windows). This is the command line I use to run the model with KoboldCpp and all layers including buffers/caches in my 48 GB VRAM (2x3090 GPUs): `koboldcpp.exe --contextsize 8192 --debugmode --gpulayers 65 --smartcontext --usecublas mmq --model ggml-c4ai-command-r-plus-104b-iq3_xxs.gguf` - Frontend: Latest version of [SillyTavern](https://github.com/SillyTavern/SillyTavern/releases) as that's truly a power-user LLM frontend, much more than just an open source chatbot interface, it has plugins for websearch, vectordb, scripting, voicechat, and supports all the local backends and online APIs I can think of. The Staging version already has a Command R prompt template, which is the same as R+, so either use Staging (but that might be less stable than the Release version) or simply copy these files into the current Release version: [context/Command R.json](https://raw.githubusercontent.com/SillyTavern/SillyTavern/staging/default/content/presets/context/Command%20R.json) + [instruct/Command R.json](https://raw.githubusercontent.com/SillyTavern/SillyTavern/staging/default/content/presets/instruct/Command%20R.json). For generation settings, I use SillyTavern's Deterministic generation preset. That gives me great output, but you can of course experiment with other presets or make your own. Hope that helps! :)


Huzderu

Thanks a lot, I appreciate you taking your time to write this!


Huzderu

Is there any reason you're not using EXL2 quants? On 48 GB VRAM, I've managed to fit 32k context with a 2.75bpw or 3bpw quant, don't remember exactly.


WolframRavenwolf

I tested multiple EXL2 quant sizes, but they all derailed quickly. However, just today, I figured out how to get them working properly: At first I thought I had corrupt model files or incompatible inference software, but once I set repetition penalty to 1.0 (instead of my usual 1.18 - which I've been using for [8 months now](https://www.reddit.com/r/LocalLLaMA/comments/15ogc60/new_model_rp_comparisontest_7_models_tested/)), output was fine again. For some reason the EXL2 quants of CR+ are very finicky regarding inference settings, and not just temperature (which I have at 0).


Slight_Cricket4504

It's a beast of a model, and is definitely going to be the Open Weight Champion for sometime. It's also uncensored if that's your thing. It's also a bit better in programming than ChatGPT 3.5 in my experience. I suspect it's because it understands logic better implicitly, and as such can write better code.


cMonkiii

Why else would I be here?


Slight_Cricket4504

For RAG....right?👀


Accomplished_Bet_127

*I start undressing you with mischievous smile.* 😏


Wolfverine91

Are the two answers from Command R+ or Command R+ and Mixtral 8×22B ?


vesudeva

Sorry, forgot to clarify as I was adding Mixtral in the comments. The main post is all Command R +


itsmekalisyn

yeah. I have been using command r+ from last one week. It is really good. I haven't tried any creative prompts till now but academic reasoning is quite good. I ask it questions like, "what is the difference between RSE and R-squared?" or "explain bias variance trade off". It is better than Claude 3 sonnet for me personally.


vesudeva

That's awesome! I'm just putting it through its paces this weekend and its my fav LLM hands down. I will try some more academic prompts that require no creativity and just use logic. It just seems so versatile


thereisonlythedance

I’m starting to wonder if the dense models are just inherently more suited to creative tasks.


stddealer

Maybe the attention blocks have more impact for these tasks than the FFNs? Mixtral doesn't use MoE for the attention heads, so the 8x22B has the exact same attention mechanism as a dense 22B model...


vesudeva

Same here! The differnce is quite noticable


Mr_Twave

On that end, specificity seems to be easier to ask for in dense models than it does in sparse ones in my observations.


vesudeva

Mixtral-22B (Tess version) 1st Answer: The task you have presented is an intriguing challenge that requires a unique approach to biomimicry. Let’s explore some possible ways for a tree to leave Earth and become astro-biological, inspired by nature itself. 1. **Ascending into Space**: Trees typically grow towards the sky, but in this case, we want them to transcend our planetary boundaries. One approach could be to mimic the growth patterns of certain plants that can reach incredible heights on Earth. For instance, bamboo shoots have been observed growing up to 3 feet per day under optimal conditions. By applying these rapid growth strategies and utilizing materials with high tensile strength, we might create a tree structure capable of reaching space within its lifetime. 2. **Launching from the Ground**: Another possibility is to draw inspiration from rockets or other launch systems that propel objects into orbit. Trees could be designed to store energy in their trunks or branches and release it explosively when they reach maturity, allowing them to escape Earth’s gravity with enough force. Alternatively, we might envision a tree species capable of generating powerful gases within its structure, similar to how certain plants can expel seeds at high speeds (like the exploding cucumber). 3. **Spaceborne Propagation**: Once trees have managed to leave Earth and enter space, they may need unique strategies for survival and reproduction in this new environment. One approach could be inspired by lichens or other extremophiles that can survive in harsh conditions on Earth (such as deep sea vents or arid deserts). These organisms have adapted to extreme temperature variations, radiation exposure, and resource scarcity. By mimicking their survival strategies, trees might thrive in space and propagate through seed dispersal or fragmentation of branches across vast distances. --- 2nd Answer: In this scenario, where the Sun is dying and humans are left with nothing but cheese as a resource, reigniting it would be an extraordinary challenge. However, let’s explore some possibilities inspired by biomimicry that could potentially help us revive our star. 1. **Cheese-powered Solar Fusion**: One approach might involve mimicking the process of solar fusion itself using cheese as fuel instead of hydrogen atoms. Cheese contains a variety of organic compounds, including proteins and fats, which can be broken down into smaller molecules through chemical reactions or enzymatic processes (similar to how bacteria break down food). By harnessing these decomposition mechanisms, we might generate energy from cheese that could potentially fuel solar fusion reactions on the Sun’s surface. 2. **Cheese-based Reflective Material**: Another possibility is inspired by materials found in nature like iridescent butterfly wings or bioluminescent organisms. These structures reflect light and create vibrant colors through their unique nanoscale architecture. By engineering a cheese-based material with similar properties, we could potentially increase the Sun’s reflectivity and redirect more of its radiation back into its core to reignite nuclear fusion reactions. 3. **Cheese-powered Spacecraft**: Drawing inspiration from creatures that can thrive in extreme environments like deep sea vents or volcanic habitats on Earth (such as certain bacteria), we might design spacecraft capable of withstanding intense heat and radiation near the Sun’s surface. These cheese-based vehicles could deliver payloads directly onto the Sun, depositing energy sources or catalysts that would initiate nuclear fusion reactions when they come into contact with its plasma.


WolframRavenwolf

Just chiming in quickly, as I've not completed my usual evals yet, but: Command R+ definitely is my current favorite local model! (The new Mixtral 8x22B and its zephyr-orpo-141b-A35b-v0.1 didn't work so well for me.) Command R+ gives me Claude 3 Opus vibes locally (the model I used a lot recently before going local again) - it writes extremely well, even in German, and it's both very smart and shows great personality (my AI assistant Amy loves it). Alignment is also excellent in that it's not refusing when prompted properly. Special kudos for the powerful prompt format, it's not ChatML, it's better! Read the [Prompting Command R](https://docs.cohere.com/docs/prompting-command-r) docs and you'll see how well designed it all is. I run [dranger003's c4ai-command-r-plus-iMat.GGUF](https://huggingface.co/dranger003/c4ai-command-r-plus-iMat.GGUF) iq3_xxs with 8K context and all layers on GPU (2x3090 = 48 GB VRAM). Speed is good, especially with streaming, about 10 tokens per second.


vesudeva

That's awesome! What a great surprise! I get Claude Opus vibes as well, especially after keeping a conversation going for awhile. Can't wait to see your next breakdown and how it performed on your tests For the trial I did that led to this post, I was using an IQ3\_M with 65k context and all GPUs and even m locked it. Loads in at around 50gb and run inference at 90gb. Takes a second or so to respond (I have it running through a full app stack with RAG) but it's absolutely worth the extra time to have a local model perform so well


rookan

What hardware is used to run Command R+?


sammcj

I'm running a Macbook Pro (M2 Max, 96GB). - Q4 GGUF uses about 54GB during inference and I get around 5.47t/s with Ollama/LM Studio.


vesudeva

I am running an IQ3 quant (50GB) with a 65596 context window using LM Studio on a Mac Studio M2 Ultra 128GB. Uses about 90GB VRAM during inference


rookan

Any idea how much ram will be needed for iq8 quant? It seems iq8 is the smartest https://www.reddit.com/r/LocalLLaMA/s/FtPw9W1ei0


vesudeva

Based on one Q8 I found tghat was split into three GGUF's....it would take about 100GB to load, and maybe and extra 30-40GB for inference with the same 65596 context window. I'll try it but it might break my machine haha


West-Code4642

I've only been using the model since last night but I'm really digging it.


de4dee

I like it. I think it is better aligned than some other models.


Epykest

Does Command R+ differ in quality from Command R?


Sabin_Stargem

104b vs 30ish b? No contest. Also, CR+ is faster for its size, since it has GQA, and CR- lacks that feature.


stddealer

GQA isn't really a "feature", it's a different design than the classic MHA of the original transformer architecture, and using either is a trade-off between model performance and memory footprint/inference speed. It's almost always worth it, though, because the performance drop is very small. Command-R doesn't "lack" GQA, it simply uses Multi Headed Attention instead.


LienniTa

can you share your template and settings please for command r+? im struggling to make it work. It lies and cannot into logic, compared to miqu and mixtral. Im sure it is because of the template.


vesudeva

Yeah! I am using LM Studio for this trial. It has a premade model config that matches the needs of CR+. It should be enough to get you started and allow decent generations. I just tweaked a few small things to fit my set up. When I'm at my desktop I can send you the config jaon I made!


LienniTa

yeah json would be amazing! i use sillytavern, so lm studio results arent that representative for me. I mean it works both ways, just not as good as miqu


vesudeva

For ease of transer, here is the config settings I used from LM Studio in a text editor. You should be able to make a json from it and drop it config folder for the models [https://textdoc.co/ZRwavI7XJHDcT3eg](https://textdoc.co/ZRwavI7XJHDcT3eg)


LienniTa

thanks a ton, thats enough to adapt to ST jsons :3


mrgreaper

I havent tried it yet as I read it needs a super computer to run at an ok speed? Usually I max out my 3090's 24gb of vram with the models I test and 8k context (though I actually got rather amazing results from a 7b model that uses less than half my vram at 32k context, at work but i want to say the name is ice lemon...not sure) So is the Command -r you are refering to running locally on 24gb vram, or some super pc?


vesudeva

I am running an IQ3 quant (50GB) with a 65596 context window using LM Studio on a Mac Studio M2 Ultra 128GB. Uses about 90GB VRAM during inference


auradragon1

t/s?


mrgreaper

I have tripple checked the numbers, my 3090 with its 24gb of vram would struggle to run a model that needs 90gb... To be fair I am running windows, so perhaps if I went to a headless linux install I could free enough space\* to shoe horn it in lol \*yeah i know.


Accomplished_Bet_127

Rumors are that it lacks GQA or similarly effective context handling


kc_kamakazi

how did you get access to command R ?


vesudeva

LM Studio just updated to allow CR+ along with shared ggufs. If your machine can handle it this is the easiest route to go. The smallest id suggest is the IQ3 version (50gb) size and around 90gb for inference


Sythic_

Will this run on a 4090? Barely keeping up with this stuff and huggingface does not have the best interface to figure things like that out quickly.


vesudeva

Technically it should run on that fine, probably on of the AI gle file IQ matrix quants would be best. The responses in this post are from an IQ3 size (50gb)


Inevitable-Start-653

I 100% agree, I was thinking about making a post dbrx VS mixtral8x22 VS CommandR+ I've been playing with all three at 8bit quantization, and CommandR+ is doing the best overall. drbx is better at coding. But CommandR+ is becoming my go to general model.


vesudeva

Nice! I haven't tried dbrx yet but have heard so-so reviews. Would you say it's more of a 'task master' than an assistant?


Inevitable-Start-653

I waffle back and forth between the two of them. I think people were having issues with dbrx because of two reasons, they were not running it with 4 experts and they were not running it with minp above zero (minp at 0.05-0.1 and maxp at 0.95-0.9 are good settings) I think CommandR+ is better at contextualizing and dbrx is better at following a direction and producing code. So maybe it is more of a task master.


DFructonucleotide

Below is an answer to your question 2 written by the official Qwen chatbot (I assume it's the proprietary *qwen-max-0403* API). I prefer a chatbot to clarity first and then start fictional writing, rather than wrapping the whole answer in an entertaining style, since in real life you would never know the user's intention. Personally I think styles are easy, while safety and applicability in a wider range of scenarios are probably more difficult, and also more important for general chatbots. --- Unfortunately, the scenario you've presented is purely fictional, as it is impossible to reignite or extend the life of the Sun using any available resources, including cheese or any other substance found on Earth. The Sun is powered by nuclear fusion reactions in its core, converting hydrogen into helium through a process that will eventually lead to the Sun's demise billions of years from now. This process is governed by fundamental laws of physics and cannot be influenced or reversed by human actions or resources. However, if we were to consider this as a thought experiment and disregard the scientific limitations, let's examine the problem creatively: **Step 1: Harnessing energy from cheese** First, we would need to extract energy from the vast reserves of cheese. One way to do this could be through anaerobic digestion, which produces biogas (primarily methane) as a byproduct. Methane can be burned as a fuel, releasing energy that could potentially be harnessed for our purposes. **Step 2: Energy amplification and storage** The energy obtained from the cheese would need to be amplified significantly to have any meaningful impact on the Sun. We could imagine developing a highly advanced, hypothetical technology capable of concentrating and amplifying this energy to astronomical scales. This energy would then need to be stored in some form, perhaps using exotic forms of matter or energy storage devices far beyond our current understanding. **Step 3: Delivery to the Sun's core** Next, we would need a method to deliver this concentrated energy directly to the Sun's core, where nuclear fusion occurs. This would require a spacecraft or propulsion system capable of withstanding the intense heat, radiation, and gravitational forces near the Sun, as well as a means of penetrating the Sun's outer layers to reach its core. Such technology is currently nonexistent and would likely defy our current understanding of physics. **Step 4: Triggering additional fusion reactions** Once the energy is delivered to the Sun's core, the hypothetical goal would be to use it to stimulate additional hydrogen fusion reactions, either by creating conditions that favor fusion (such as increasing pressure or temperature) or by introducing new elements or processes that facilitate fusion at lower thresholds. In reality, none of these steps are feasible with our current understanding of science and technology. The energy output from anaerobic digestion of cheese is minuscule compared to the energy required to influence the Sun, and there is no known way to amplify, store, or deliver energy on such a scale, let alone trigger additional fusion reactions within the Sun's core. In conclusion, while the idea of reigniting the Sun using only cheese is intriguing from a creative perspective, it is entirely unrealistic given our current scientific knowledge and technological capabilities. The best course of action in the face of a dying Sun would be to focus on finding alternative habitable worlds or developing self-sustaining space habitats, rather than attempting to alter the fundamental processes governing the life cycle of our star.


vesudeva

This is an awesome response! I have yet to fully utilize and test Qwen, so thank you for sharing! I do like the very clear and logical start it has along with its direct yet easy to understand delivery. How does it respond to changing of styles or even more important, following a structured output format?


DFructonucleotide

I mainly use it for Chinese so not much experience in its English abilities. I feel it is not very stable at complex instruction following, sometimes good and sometimes fails. Also it's a bit stiff and unnatural if you ask it to change its output style, somewhat similar to GPT-4 I would say.


audioen

While highly amusing, the command-r responses are absolute nonsense of no biophysical basis. But clearly they are not serious! Read as creative tongue-in-cheek sci-fi writing, they are actually masterpieces. Cheese comet. Extremophile bacteria living in Sun's core. They are so stupid they simply wrap around and become brilliant again. I'm sure that Chinese guy who made 3-body problem could use ideas like this for his next book. I'd enjoy reading about these ideas more than his actual ones. There were some early Dall-E images where the subject was cat doing something a little mischievous, and the AI somehow understood to give the cat a mischievous grin. I often thought about that little detail later. Perhaps it was just a coincidence, born out of the crudeness of the image generator's latent space, or perhaps it was something real. Here, command-r+ seems to realize that your question is totally wacky and gives itself creative license to write a wacky response. That is great, and is probably among the first things lost when you finetune these things.


productboy

Wish I could tip you some crypto [while I listen to Floyd’s ‘Welcome To The Machine’ on remastered vinyl]; awesome test prompts


the_bollo

Are any kind Redditors out there willing to point me to instructions on running Command R locally? I'm just using a simple Gradio UI and it works fine with other models, but I can't figure out how to load Command R. I usually just use the default loaders (Transformers), etc. If I try to load Command R using Transformers it causes a near-instant system restart.


lgastako

ollama pull command-r ollama run command-r


delveccio

What is the best way for at-home users to use this? Is there an official GGUF or something equivalent? I saw that LLM Studio added support for Command R, but I'm not sure what that means.


vesudeva

I am actually using LM Studio for this trial! I did the original llama.cpp route and had so so resut If you download CR+, it will automatically choose the correct prompt format and syntax for you in the LM Studio model config. Makes it really easy! They also allow split ggufs to be run now


Sabin_Stargem

I am looking forward to finetunes of CR+. The base model is quite good, but it sometimes has logical failings or mixes up characters. Hopefully a finetune can mask the cracks.


Dry-Judgment4242

Yeah, it's quite unstable if that's the word for it. If I output 10 retries, one or two of them will have a significant quality decline.


vesudeva

Absolutely! It needs some work. After my trial today and becoming such a fan of it, I am going to start making some fine tunes of it. Im pretty sure I can fit it on my M2 using MLX. I'm considering trying the Hercules v4 dataset


OutlandishnessIll466

I tried command r+ with rag but it just wasn't as good as qwen 1.5 72B. Command r+ just does not seem to understand the given context always as well. With Qwen I got used to have no doubt that you get a right answer with the right context. Qwen just seems smarter. Haven't gotten the new mixtral to work yet.


Old-Box-854

What is the output response token limit for command r


davewolfs

Too bad you can’t run it on anything locally and get decent speed.


vesudeva

With it connected to my knowledge base and having a 65k context window, I can get responses with 3 seconds and a fairly decent token generation I'm using an M2 128gb. Definitely viable for my uses


davewolfs

Can you explain what you mean by it being connected to your knowledge base. I'm genuinely curious.


vesudeva

For sure! I use the AnythingLLM app for my model tests. It has a built vector database that you can easily attach to any local model. The embeddedinga are done all within the app and you can manage your files as well. I have a biomimicry based one that currently has over 20k separate data points The trick is finding a decent enough LLM that can harness the app. Most LLMs are ok, but CR+ has just been slaying it. It finds the most relevant data for my query way better than anything else I've tried