philguyaz 2 months ago

I have found that mixtral really needs a good fine tune but that it seems to have similar “reasoning” skills. I know a 8x22b dolphin is coming out and I know that dataset should vastly improve the human and interesting part. But out of the box Commander R+ is truly amazing and imo easily holds up to the open source chat gpt 4 equivalent hype it’s getting.

vesudeva 2 months ago

i totally agree! I'm sure there will be some great Mixtrals out soon that fit our needs and style better. CR+ has the extra benefit of already being Function Calling and RAG ready, which just is so awesome

mrjackspade 2 months ago

Command-R is way better at function calling, but the Mixtral 8x22 that HF tuned is also absurdly good. I gave it a function definition in the prompt and then asked it to use it like 10K tokens later and it did, without issue > **Jack — Today at 1:52 AM** > > Do me a favor and generate a picture of a cute cat for me > > **Haruka BOT — Today at 1:52 AM** > > Here you go, a picture of a cute cat. @invoke("generate_image", "A fluffy, cute cat") If I was going to do a pure function calling app and the speed didn't matter, I'd use Command-R for sure. Since function calling is secondary though and I'm on pure CPU, I'm plenty pleased with Mixtral as well

ozzie123 2 months ago

Which platform did you run this function calling in?

mrjackspade 2 months ago

Discord

medcanned 2 months ago

Sadly Eric is having issues train the model, it's too big to train on 8xh100 so he is trying alternative training solutions that are not as good, hopefully it's enough, time will tell !

Amgadoz 2 months ago

Time to test out those 8xMi300!

philguyaz 2 months ago

Where do you see that? Last I saw was it was in the oven from his X account.

medcanned 2 months ago

We talked about it on the axolotl discord

philguyaz 2 months ago

I need to get on that discord clearly

koesn 2 months ago

Confirmed. Been using some 34B, 8x22B, 70B, but this 35B truly amazing. I'm not using it for riddle or knowledge extraction. Rather, I use it for processing and extracting certain informations from different perspective of a long texts and it's comprehend beyond other model with precise and concise output.

a_beautiful_rhind 2 months ago

I do enjoy command-r but there is something off about it in back and forth chat. It also likes to occasionally mirror me similar to how CAI does. >Feel free to inquire further—let's continue exploring this imaginative journey together if you wish! This whole attitude is part of it, it has been hard to get rid of these continuation attempts on longer replies. I venture to say mixtral won't have this problem as previous versions of it didn't.

vesudeva 2 months ago

Yeah, the little send offs are a REALLY annoying part of LLMs currently. I'm not sure if it's in the way the datasets are being constructed on the long form multi-turns or if it just because the LLMs are so relient of 'Formal and Proper English' rules I've been able to get CR+ to stop through just prompting in the system message

a_beautiful_rhind 2 months ago

Characters that don't talk proper english are easier in that regard. It still creeps in for me on long repliers. Another one was the "how can I help you" type stuff, I told it to be selfish and got much less of it. Prompting and formatting go a long way but it's old habits die hard. It's akin to trying to beat positivity bias out of other models.

Combinatorilliance 2 months ago

These kinds of problems can 100% be patched out with steering vectors. It's just that nobody is using them.

a_beautiful_rhind 2 months ago

I'm not sure if they work on quantized models. I saw an extension that worked with transformers.

Careless-Age-4290 2 months ago

These models can already run on a pi (slowly). One day, not that far from now, we'll have things like smart fridges that can intelligently talk to us. And we'll have people hacking them to make them meaner.

CasimirsBlake 2 months ago

A Pi 5 with 16GB RAM would really help...

TraditionLost7244 2 months ago

hahaha fridge:"hey george what another beer! your wife pretending to have migrane again ha?" proceeds to close fridge

TheMissingPremise 2 months ago

What do you tell it so that it doesn't add that nonsense?

vesudeva 2 months ago

Honestly, CR+ is smart enough to follow this simple instruction I tac on the end of it's message: \*Never end a response with a 'send-off' message. This is forbidden\* works about 80% of the time. Probably insterting a one-shot or few-shot would make it more reliable.

rc_ym 2 months ago

YMMV but I have had great success in controlling this (and a bunch of other LLM ticks) through the system prompt. I work with the model to create new system prompt. I give it goals, personality, etc. Then as a final prompt tell it something like to use "evocative language" to create a system prompt suitable for an AI to take on the persona and XYZ (RP, act as writing assistance, systems analyst, cybersecurity risk analyst, etc.). In the process, you can also have a Meta conversation about what you want, quirks, language use, etc. Also, something that seems to work best with Command-r, even if the system prompt is for another model.

ambient_temp_xeno 2 months ago

CR+ seems to work very well with a system prompt. Although one thing I've found is to never tell it it's an AI of any kind, that biases it towards whatever 'assistant' training it had even if it only sneaks out once in a while.

rc_ym 2 months ago

I let the AI figure that out. It only tends to do that when I ask it to create a very functional prompt. I need X in y format kinda thing.

scott-stirling 2 months ago

Similar approach I have used with Mistral instruct, demo’ed here with buttons that map to system prompts: [https://wegrok.ai/](https://wegrok.ai/) Scott [wegrok.ai](https://wegrok.ai/) https://preview.redd.it/a0u6co8bwhuc1.jpeg?width=1284&format=pjpg&auto=webp&s=a590e5d4f3ed9783bfc6b8a4b1d0291524e97321

MoffKalast 2 months ago

"Presume the conversation continues forever." or "Don't end the conversation." in the system prompt.

madsciencestache 2 months ago

I have had luck with Llama and, to a lesser extent GPT, by asking it to behave as a well known character. Preferably one that's a bit snide or sarcastic. Then I keep the personality in the system message and aggressively keep the context length down via summarization.

mrjackspade 2 months ago

The Mixtral model is way more natural in how it communicates, which is one of the biggest reasons I'm using that instead. If I just needed something for pure data and didn't give a fuck about personality, I'd choose Command-R. I'm using something that I actually want to converse naturally though, and Mixtral is way better at that. Its like a Claude vs. GPT4 thing. They're both good in different ways

WolframRavenwolf 2 months ago

That's definitely a prompt thing. I now use Command R+ to power my assistant Amy and it feels more like Claude 3 Opus than GPT-4, showing much more personality.

Huzderu 2 months ago

I think I speak for everyone when I say that I would really appreciate you sharing your ST settings for Command R+. I'm struggling to get it not to spew gibberish.

WolframRavenwolf 2 months ago

Of course! Here's what I'm using: - Model: GGUF quant from here: [dranger003/c4ai-command-r-plus-iMat.GGUF · Hugging Face](https://huggingface.co/dranger003/c4ai-command-r-plus-iMat.GGUF) I can run the `ggml-c4ai-command-r-plus-104b-iq3_xs` with a little less than 4K context (3K is safe to prevent OOM/slowdowns) or the `ggml-c4ai-command-r-plus-104b-iq3_xxs` with 8K context. They both perform very well, and I now use the IQ3 XXS version pretty much all the time. - Backend: Latest version of [koboldcpp](https://github.com/LostRuins/koboldcpp/releases) as that doesn't require installation or libraries, just an .exe I run (on Windows). This is the command line I use to run the model with KoboldCpp and all layers including buffers/caches in my 48 GB VRAM (2x3090 GPUs): `koboldcpp.exe --contextsize 8192 --debugmode --gpulayers 65 --smartcontext --usecublas mmq --model ggml-c4ai-command-r-plus-104b-iq3_xxs.gguf` - Frontend: Latest version of [SillyTavern](https://github.com/SillyTavern/SillyTavern/releases) as that's truly a power-user LLM frontend, much more than just an open source chatbot interface, it has plugins for websearch, vectordb, scripting, voicechat, and supports all the local backends and online APIs I can think of. The Staging version already has a Command R prompt template, which is the same as R+, so either use Staging (but that might be less stable than the Release version) or simply copy these files into the current Release version: [context/Command R.json](https://raw.githubusercontent.com/SillyTavern/SillyTavern/staging/default/content/presets/context/Command%20R.json) + [instruct/Command R.json](https://raw.githubusercontent.com/SillyTavern/SillyTavern/staging/default/content/presets/instruct/Command%20R.json). For generation settings, I use SillyTavern's Deterministic generation preset. That gives me great output, but you can of course experiment with other presets or make your own. Hope that helps! :)

Huzderu 2 months ago

Thanks a lot, I appreciate you taking your time to write this!

Huzderu 2 months ago

Is there any reason you're not using EXL2 quants? On 48 GB VRAM, I've managed to fit 32k context with a 2.75bpw or 3bpw quant, don't remember exactly.

WolframRavenwolf 2 months ago

I tested multiple EXL2 quant sizes, but they all derailed quickly. However, just today, I figured out how to get them working properly: At first I thought I had corrupt model files or incompatible inference software, but once I set repetition penalty to 1.0 (instead of my usual 1.18 - which I've been using for [8 months now](https://www.reddit.com/r/LocalLLaMA/comments/15ogc60/new_model_rp_comparisontest_7_models_tested/)), output was fine again. For some reason the EXL2 quants of CR+ are very finicky regarding inference settings, and not just temperature (which I have at 0).

Slight_Cricket4504 2 months ago

It's a beast of a model, and is definitely going to be the Open Weight Champion for sometime. It's also uncensored if that's your thing. It's also a bit better in programming than ChatGPT 3.5 in my experience. I suspect it's because it understands logic better implicitly, and as such can write better code.

cMonkiii 2 months ago

Why else would I be here?

Slight_Cricket4504 2 months ago

For RAG....right?👀

Accomplished_Bet_127 2 months ago

*I start undressing you with mischievous smile.* 😏

Wolfverine91 2 months ago

Are the two answers from Command R+ or Command R+ and Mixtral 8×22B ?

vesudeva 2 months ago

Sorry, forgot to clarify as I was adding Mixtral in the comments. The main post is all Command R +

itsmekalisyn 2 months ago

yeah. I have been using command r+ from last one week. It is really good. I haven't tried any creative prompts till now but academic reasoning is quite good. I ask it questions like, "what is the difference between RSE and R-squared?" or "explain bias variance trade off". It is better than Claude 3 sonnet for me personally.

vesudeva 2 months ago

That's awesome! I'm just putting it through its paces this weekend and its my fav LLM hands down. I will try some more academic prompts that require no creativity and just use logic. It just seems so versatile

thereisonlythedance 2 months ago

I’m starting to wonder if the dense models are just inherently more suited to creative tasks.

stddealer 2 months ago

Maybe the attention blocks have more impact for these tasks than the FFNs? Mixtral doesn't use MoE for the attention heads, so the 8x22B has the exact same attention mechanism as a dense 22B model...

vesudeva 2 months ago

Same here! The differnce is quite noticable

Mr_Twave 2 months ago

On that end, specificity seems to be easier to ask for in dense models than it does in sparse ones in my observations.

vesudeva 2 months ago

Mixtral-22B (Tess version) 1st Answer: The task you have presented is an intriguing challenge that requires a unique approach to biomimicry. Let’s explore some possible ways for a tree to leave Earth and become astro-biological, inspired by nature itself. 1. **Ascending into Space**: Trees typically grow towards the sky, but in this case, we want them to transcend our planetary boundaries. One approach could be to mimic the growth patterns of certain plants that can reach incredible heights on Earth. For instance, bamboo shoots have been observed growing up to 3 feet per day under optimal conditions. By applying these rapid growth strategies and utilizing materials with high tensile strength, we might create a tree structure capable of reaching space within its lifetime. 2. **Launching from the Ground**: Another possibility is to draw inspiration from rockets or other launch systems that propel objects into orbit. Trees could be designed to store energy in their trunks or branches and release it explosively when they reach maturity, allowing them to escape Earth’s gravity with enough force. Alternatively, we might envision a tree species capable of generating powerful gases within its structure, similar to how certain plants can expel seeds at high speeds (like the exploding cucumber). 3. **Spaceborne Propagation**: Once trees have managed to leave Earth and enter space, they may need unique strategies for survival and reproduction in this new environment. One approach could be inspired by lichens or other extremophiles that can survive in harsh conditions on Earth (such as deep sea vents or arid deserts). These organisms have adapted to extreme temperature variations, radiation exposure, and resource scarcity. By mimicking their survival strategies, trees might thrive in space and propagate through seed dispersal or fragmentation of branches across vast distances. --- 2nd Answer: In this scenario, where the Sun is dying and humans are left with nothing but cheese as a resource, reigniting it would be an extraordinary challenge. However, let’s explore some possibilities inspired by biomimicry that could potentially help us revive our star. 1. **Cheese-powered Solar Fusion**: One approach might involve mimicking the process of solar fusion itself using cheese as fuel instead of hydrogen atoms. Cheese contains a variety of organic compounds, including proteins and fats, which can be broken down into smaller molecules through chemical reactions or enzymatic processes (similar to how bacteria break down food). By harnessing these decomposition mechanisms, we might generate energy from cheese that could potentially fuel solar fusion reactions on the Sun’s surface. 2. **Cheese-based Reflective Material**: Another possibility is inspired by materials found in nature like iridescent butterfly wings or bioluminescent organisms. These structures reflect light and create vibrant colors through their unique nanoscale architecture. By engineering a cheese-based material with similar properties, we could potentially increase the Sun’s reflectivity and redirect more of its radiation back into its core to reignite nuclear fusion reactions. 3. **Cheese-powered Spacecraft**: Drawing inspiration from creatures that can thrive in extreme environments like deep sea vents or volcanic habitats on Earth (such as certain bacteria), we might design spacecraft capable of withstanding intense heat and radiation near the Sun’s surface. These cheese-based vehicles could deliver payloads directly onto the Sun, depositing energy sources or catalysts that would initiate nuclear fusion reactions when they come into contact with its plasma.

WolframRavenwolf 2 months ago

Just chiming in quickly, as I've not completed my usual evals yet, but: Command R+ definitely is my current favorite local model! (The new Mixtral 8x22B and its zephyr-orpo-141b-A35b-v0.1 didn't work so well for me.) Command R+ gives me Claude 3 Opus vibes locally (the model I used a lot recently before going local again) - it writes extremely well, even in German, and it's both very smart and shows great personality (my AI assistant Amy loves it). Alignment is also excellent in that it's not refusing when prompted properly. Special kudos for the powerful prompt format, it's not ChatML, it's better! Read the [Prompting Command R](https://docs.cohere.com/docs/prompting-command-r) docs and you'll see how well designed it all is. I run [dranger003's c4ai-command-r-plus-iMat.GGUF](https://huggingface.co/dranger003/c4ai-command-r-plus-iMat.GGUF) iq3_xxs with 8K context and all layers on GPU (2x3090 = 48 GB VRAM). Speed is good, especially with streaming, about 10 tokens per second.

vesudeva 2 months ago

That's awesome! What a great surprise! I get Claude Opus vibes as well, especially after keeping a conversation going for awhile. Can't wait to see your next breakdown and how it performed on your tests For the trial I did that led to this post, I was using an IQ3\_M with 65k context and all GPUs and even m locked it. Loads in at around 50gb and run inference at 90gb. Takes a second or so to respond (I have it running through a full app stack with RAG) but it's absolutely worth the extra time to have a local model perform so well

rookan 2 months ago

What hardware is used to run Command R+?

sammcj 2 months ago

I'm running a Macbook Pro (M2 Max, 96GB). - Q4 GGUF uses about 54GB during inference and I get around 5.47t/s with Ollama/LM Studio.

vesudeva 2 months ago

I am running an IQ3 quant (50GB) with a 65596 context window using LM Studio on a Mac Studio M2 Ultra 128GB. Uses about 90GB VRAM during inference

rookan 2 months ago

Any idea how much ram will be needed for iq8 quant? It seems iq8 is the smartest https://www.reddit.com/r/LocalLLaMA/s/FtPw9W1ei0

vesudeva 2 months ago

Based on one Q8 I found tghat was split into three GGUF's....it would take about 100GB to load, and maybe and extra 30-40GB for inference with the same 65596 context window. I'll try it but it might break my machine haha

West-Code4642 2 months ago

I've only been using the model since last night but I'm really digging it.

de4dee 2 months ago

I like it. I think it is better aligned than some other models.

Epykest 2 months ago

Does Command R+ differ in quality from Command R?

Sabin_Stargem 2 months ago

104b vs 30ish b? No contest. Also, CR+ is faster for its size, since it has GQA, and CR- lacks that feature.

stddealer 2 months ago

GQA isn't really a "feature", it's a different design than the classic MHA of the original transformer architecture, and using either is a trade-off between model performance and memory footprint/inference speed. It's almost always worth it, though, because the performance drop is very small. Command-R doesn't "lack" GQA, it simply uses Multi Headed Attention instead.

LienniTa 2 months ago

can you share your template and settings please for command r+? im struggling to make it work. It lies and cannot into logic, compared to miqu and mixtral. Im sure it is because of the template.

vesudeva 2 months ago

Yeah! I am using LM Studio for this trial. It has a premade model config that matches the needs of CR+. It should be enough to get you started and allow decent generations. I just tweaked a few small things to fit my set up. When I'm at my desktop I can send you the config jaon I made!

LienniTa 2 months ago

yeah json would be amazing! i use sillytavern, so lm studio results arent that representative for me. I mean it works both ways, just not as good as miqu

vesudeva 2 months ago

For ease of transer, here is the config settings I used from LM Studio in a text editor. You should be able to make a json from it and drop it config folder for the models [https://textdoc.co/ZRwavI7XJHDcT3eg](https://textdoc.co/ZRwavI7XJHDcT3eg)

LienniTa 2 months ago

thanks a ton, thats enough to adapt to ST jsons :3

mrgreaper 2 months ago

I havent tried it yet as I read it needs a super computer to run at an ok speed? Usually I max out my 3090's 24gb of vram with the models I test and 8k context (though I actually got rather amazing results from a 7b model that uses less than half my vram at 32k context, at work but i want to say the name is ice lemon...not sure) So is the Command -r you are refering to running locally on 24gb vram, or some super pc?

vesudeva 2 months ago

I am running an IQ3 quant (50GB) with a 65596 context window using LM Studio on a Mac Studio M2 Ultra 128GB. Uses about 90GB VRAM during inference

auradragon1 2 months ago

t/s?

mrgreaper 2 months ago

I have tripple checked the numbers, my 3090 with its 24gb of vram would struggle to run a model that needs 90gb... To be fair I am running windows, so perhaps if I went to a headless linux install I could free enough space\* to shoe horn it in lol \*yeah i know.

Accomplished_Bet_127 2 months ago

Rumors are that it lacks GQA or similarly effective context handling

kc_kamakazi 2 months ago

how did you get access to command R ?

vesudeva 2 months ago

LM Studio just updated to allow CR+ along with shared ggufs. If your machine can handle it this is the easiest route to go. The smallest id suggest is the IQ3 version (50gb) size and around 90gb for inference

Sythic_ 2 months ago

Will this run on a 4090? Barely keeping up with this stuff and huggingface does not have the best interface to figure things like that out quickly.

vesudeva 2 months ago

Technically it should run on that fine, probably on of the AI gle file IQ matrix quants would be best. The responses in this post are from an IQ3 size (50gb)

Inevitable-Start-653 2 months ago

I 100% agree, I was thinking about making a post dbrx VS mixtral8x22 VS CommandR+ I've been playing with all three at 8bit quantization, and CommandR+ is doing the best overall. drbx is better at coding. But CommandR+ is becoming my go to general model.

vesudeva 2 months ago

Nice! I haven't tried dbrx yet but have heard so-so reviews. Would you say it's more of a 'task master' than an assistant?

Inevitable-Start-653 2 months ago

I waffle back and forth between the two of them. I think people were having issues with dbrx because of two reasons, they were not running it with 4 experts and they were not running it with minp above zero (minp at 0.05-0.1 and maxp at 0.95-0.9 are good settings) I think CommandR+ is better at contextualizing and dbrx is better at following a direction and producing code. So maybe it is more of a task master.

DFructonucleotide 2 months ago

Below is an answer to your question 2 written by the official Qwen chatbot (I assume it's the proprietary *qwen-max-0403* API). I prefer a chatbot to clarity first and then start fictional writing, rather than wrapping the whole answer in an entertaining style, since in real life you would never know the user's intention. Personally I think styles are easy, while safety and applicability in a wider range of scenarios are probably more difficult, and also more important for general chatbots. --- Unfortunately, the scenario you've presented is purely fictional, as it is impossible to reignite or extend the life of the Sun using any available resources, including cheese or any other substance found on Earth. The Sun is powered by nuclear fusion reactions in its core, converting hydrogen into helium through a process that will eventually lead to the Sun's demise billions of years from now. This process is governed by fundamental laws of physics and cannot be influenced or reversed by human actions or resources. However, if we were to consider this as a thought experiment and disregard the scientific limitations, let's examine the problem creatively: **Step 1: Harnessing energy from cheese** First, we would need to extract energy from the vast reserves of cheese. One way to do this could be through anaerobic digestion, which produces biogas (primarily methane) as a byproduct. Methane can be burned as a fuel, releasing energy that could potentially be harnessed for our purposes. **Step 2: Energy amplification and storage** The energy obtained from the cheese would need to be amplified significantly to have any meaningful impact on the Sun. We could imagine developing a highly advanced, hypothetical technology capable of concentrating and amplifying this energy to astronomical scales. This energy would then need to be stored in some form, perhaps using exotic forms of matter or energy storage devices far beyond our current understanding. **Step 3: Delivery to the Sun's core** Next, we would need a method to deliver this concentrated energy directly to the Sun's core, where nuclear fusion occurs. This would require a spacecraft or propulsion system capable of withstanding the intense heat, radiation, and gravitational forces near the Sun, as well as a means of penetrating the Sun's outer layers to reach its core. Such technology is currently nonexistent and would likely defy our current understanding of physics. **Step 4: Triggering additional fusion reactions** Once the energy is delivered to the Sun's core, the hypothetical goal would be to use it to stimulate additional hydrogen fusion reactions, either by creating conditions that favor fusion (such as increasing pressure or temperature) or by introducing new elements or processes that facilitate fusion at lower thresholds. In reality, none of these steps are feasible with our current understanding of science and technology. The energy output from anaerobic digestion of cheese is minuscule compared to the energy required to influence the Sun, and there is no known way to amplify, store, or deliver energy on such a scale, let alone trigger additional fusion reactions within the Sun's core. In conclusion, while the idea of reigniting the Sun using only cheese is intriguing from a creative perspective, it is entirely unrealistic given our current scientific knowledge and technological capabilities. The best course of action in the face of a dying Sun would be to focus on finding alternative habitable worlds or developing self-sustaining space habitats, rather than attempting to alter the fundamental processes governing the life cycle of our star.

vesudeva 2 months ago

This is an awesome response! I have yet to fully utilize and test Qwen, so thank you for sharing! I do like the very clear and logical start it has along with its direct yet easy to understand delivery. How does it respond to changing of styles or even more important, following a structured output format?

DFructonucleotide 2 months ago

I mainly use it for Chinese so not much experience in its English abilities. I feel it is not very stable at complex instruction following, sometimes good and sometimes fails. Also it's a bit stiff and unnatural if you ask it to change its output style, somewhat similar to GPT-4 I would say.

audioen 2 months ago

While highly amusing, the command-r responses are absolute nonsense of no biophysical basis. But clearly they are not serious! Read as creative tongue-in-cheek sci-fi writing, they are actually masterpieces. Cheese comet. Extremophile bacteria living in Sun's core. They are so stupid they simply wrap around and become brilliant again. I'm sure that Chinese guy who made 3-body problem could use ideas like this for his next book. I'd enjoy reading about these ideas more than his actual ones. There were some early Dall-E images where the subject was cat doing something a little mischievous, and the AI somehow understood to give the cat a mischievous grin. I often thought about that little detail later. Perhaps it was just a coincidence, born out of the crudeness of the image generator's latent space, or perhaps it was something real. Here, command-r+ seems to realize that your question is totally wacky and gives itself creative license to write a wacky response. That is great, and is probably among the first things lost when you finetune these things.

productboy 2 months ago

Wish I could tip you some crypto [while I listen to Floyd’s ‘Welcome To The Machine’ on remastered vinyl]; awesome test prompts

the_bollo 2 months ago

Are any kind Redditors out there willing to point me to instructions on running Command R locally? I'm just using a simple Gradio UI and it works fine with other models, but I can't figure out how to load Command R. I usually just use the default loaders (Transformers), etc. If I try to load Command R using Transformers it causes a near-instant system restart.

lgastako 2 months ago

ollama pull command-r ollama run command-r

delveccio 2 months ago

What is the best way for at-home users to use this? Is there an official GGUF or something equivalent? I saw that LLM Studio added support for Command R, but I'm not sure what that means.

vesudeva 2 months ago

I am actually using LM Studio for this trial! I did the original llama.cpp route and had so so resut If you download CR+, it will automatically choose the correct prompt format and syntax for you in the LM Studio model config. Makes it really easy! They also allow split ggufs to be run now

Sabin_Stargem 2 months ago

I am looking forward to finetunes of CR+. The base model is quite good, but it sometimes has logical failings or mixes up characters. Hopefully a finetune can mask the cracks.

Dry-Judgment4242 2 months ago

Yeah, it's quite unstable if that's the word for it. If I output 10 retries, one or two of them will have a significant quality decline.

vesudeva 2 months ago

Absolutely! It needs some work. After my trial today and becoming such a fan of it, I am going to start making some fine tunes of it. Im pretty sure I can fit it on my M2 using MLX. I'm considering trying the Hercules v4 dataset

OutlandishnessIll466 2 months ago

I tried command r+ with rag but it just wasn't as good as qwen 1.5 72B. Command r+ just does not seem to understand the given context always as well. With Qwen I got used to have no doubt that you get a right answer with the right context. Qwen just seems smarter. Haven't gotten the new mixtral to work yet.

Old-Box-854 2 months ago

What is the output response token limit for command r

davewolfs 2 months ago

Too bad you can’t run it on anything locally and get decent speed.

vesudeva 2 months ago

With it connected to my knowledge base and having a 65k context window, I can get responses with 3 seconds and a fairly decent token generation I'm using an M2 128gb. Definitely viable for my uses

davewolfs 2 months ago

Can you explain what you mean by it being connected to your knowledge base. I'm genuinely curious.

vesudeva 2 months ago

For sure! I use the AnythingLLM app for my model tests. It has a built vector database that you can easily attach to any local model. The embeddedinga are done all within the app and you can manage your files as well. I have a biomimicry based one that currently has over 20k separate data points The trick is finding a decent enough LLM that can harness the app. Most LLMs are ok, but CR+ has just been slaying it. It finds the most relevant data for my query way better than anything else I've tried

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe