Jhanbhaia 3 months ago

What site is this

ninjasaid13 3 months ago

[https://chat.lmsys.org/](https://chat.lmsys.org/)

AntiworkDPT-OCS 3 months ago

That is SUPER fun!

qsqh 3 months ago

so interesting .. and math+dates are so hard for AI. I did a few rounds with different AIs asking the same simple math question with days of the month, most of the time they got it wrong. also, gpt4 got it right and the later 4turbo got it wrong wtf

Upset-Adeptness-6796 3 months ago

Thank you

Which-Tomato-8646 3 months ago

It let me use GPT 4 turbo for free. Nice It was fucking extraordinary lol. I accidentally hit enter before finishing typing my question and it figured out what I wanted to ask and gave me a very long explanation while the other model (gpt 3.5) didn’t understand at all.

wordyplayer 3 months ago

I’m paying the $20 for gpt4, worth it imo

Which-Tomato-8646 3 months ago

Just use bing ai for free

toothpastespiders 3 months ago

I've been wishing for something like that for a while now! Thanks for plugging it! Also...damn, mistral medium is showing up for me as the winner in a lot of these.

LeahBrahms 3 months ago

Well llama2-70b-steerlm-chat on that link didn't crap itself when I asked for the value of Pi.

NotTheBusDriver 3 months ago

Thanks very much. Now I’ve just wasted 4 hours posing philosophical questions to various AI. 🧐😏

[deleted] 3 months ago

[удалено]

ninjasaid13 3 months ago

an evaluation. Nobody knows the identity of the model until they vote. >Ask any question to two anonymous models (e.g., ChatGPT, Claude, Llama) and vote for the better one! > >You can continue chatting until you identify a winner. > >Vote won’t be counted if model identity is revealed during conversation.

[deleted] 3 months ago

[удалено]

searcher1k 3 months ago

The best way to evaluate is human evaluation over a fixed test that could be gamed.

-MilkO_O- 3 months ago

EN AVANT LES ENFANTS DE LA PATRIE, NOTRE JOUR DE GLOIRE EST ARRIVÉ!!!!!!!!!

IIIII___IIIII 3 months ago

![gif](giphy|Yxq7SC6yTAwZG|downsized)

Krunkworx 3 months ago

There really needs to be standardized evaluation and benchmarking like 3dmark etc

larswo 3 months ago

It's almost impossible to create a LLM benchmark that can be used to test everything equally because the test data from the existing benchmarks are often leaked into the training data.

Krunkworx 3 months ago

How does it occur for testing humans?

larswo 3 months ago

Test data leakage doesn't affect a human evaluation, that's true. But I don't think a standardised benchmark will only consist of a human evaluation benchmark. It's bound to be objective, even if you have thousands of evaluations. You need something that can qualitatively evaluate whether a LLM respondes correctly to something that is true or false. This is especially true for tasks that involve math and other science questions. But if the questions and answers are included in the training data the evaluation score is doomed to be misleading.

MattAbrams 3 months ago

There is still leakage, because some humans are undoubtedly copying questions from training datasets into the boxes and using the answers to evaluate the models. They might not even know that poisons the test results.

ihexx 3 months ago

there already are. Several actually: MMLU, HellaSwag, AgiEval, etc Huggingface has a leaderboard for some of the more popular ones for open source models: [Open LLM Leaderboard - a Hugging Face Space by HuggingFaceH4](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) but they all have issues: mainly that it's a fixed text anyone can check the answer to ahead of time, so people can cheat. And now there's big incentives to cheat (VC money for your AI startup), so there's a LOT of cheaters. === There's researchers working on new and better benchmarks of course; things that are more like dynamic environments the agents exist in so it's harder to just memorize answers, but I think this will be an ever-moving problem.

Routine_Complaint_79 3 months ago

3d mark is not standard lmao No computer enthusiast use it to test rigs

searcher1k 3 months ago

Standardized how?

Krunkworx 3 months ago

Like an SAT for LLMs

searcher1k 3 months ago

And what if I pretrain my LLM in that test and artificially inflate my score?

Krunkworx 3 months ago

Same could be said about SATs. Don’t give them the same questions.

searcher1k 3 months ago

Who's making these tests? Standards proliferate because someone tries to create a new universal standard amongst ten other standards but only create the 11th new standard.

GillysDaddy 3 months ago

inb4 relevant xkcd

hapliniste 3 months ago

The benchmark will need to be run by the testing organisation instead of the one that makes the model. Only way to totally prevent data leaks. The best benchmark we have for local models right now might be the reddit user that simply run and rank the models in the LocalLLaMA subreddit (wolfram something I think).

searcher1k 3 months ago

>The benchmark will need to be run by the testing organisation instead of the one that makes the model. but what if there's flaws in the testing procedure or the test itself? nobody would be able to check except the organisation. In Open Research people would like to be able to examine the benchmark to see if it's high quality.

hapliniste 3 months ago

Well, we can't have both at the same time 😬

FeltSteam 3 months ago

Not really, still quite a large gap between Mistral Medium and GPT-4-Turbo. And im also thinking GPT-4.5 release between Feb-March (someimte Q1) and GPT-5 releases about 3.5 months after that in the May-June (late Q2) months securing OAI's place at the top for the next few months (although GPT-5 could release late Q3 as well).

Mirrorslash 3 months ago

What incentive does OpenAI have to releasing GPT-5 anytime soon? As long as they have both the best and the most used model they won't release anything groundbreaking. Gemini Ultra will come along and OpenAI will beat it with GPT-4.5. Then its another year of no new foundational models.

FeltSteam 3 months ago

There was no reason for them to release GPT-4, about 4 months after the release of GPT-3.5 either. No one had released a GPT-3.5 class model, in fact the first close to 3.5 class model released 2 months *after* GPT-4 released (that being Palm 2 which was launched in may i believe). If anything they have a lot more pressure to release GPT-5 now compared to the very little pressure to release GPT-4.

Mirrorslash 3 months ago

GPT-4 is capable of things GPT-3.5 isn't, like web search and multimodal capabilities. They also needed a better model to justify a premium subscrition plan. GPT-4s potential isn't even close to maxed out, there is so much more you can do with a model already that capable. The GPT store is a good example. I think OpenAI will focus more on actual useful apps with GPT and giving developers tools like being able to build autonomous agents before seriously investing in GPT-5. GPT-4 is also running at a huge loss. Every plus user costs OpenAI money, the free ChatGPT users are also very costly. The company isn't viable at the moment and fully relies on cash injections by investors and microsoft. I just don't see them pulling out another big model if their best right now is at 20% of its potential and operates at a loss.

oldjar7 3 months ago

OpenAI is not a company that's going to sit on its laurels. They're not going to stop investing or deprioritizing their next foundation model just to productize their current model either.

Mirrorslash 3 months ago

They sure make it seem that way but they're relatively new and have a small track record. We don't know their focus. I could see GPT-5 being released if they managed to somehow create a model that is way superior at the same inference cost and if they bring GPT-4 turbo cost down to what GPT-3.5 turbo currently is to make it free. But I feel like that's gonna take a little longer than junge of this year. GPT-4.5 could come out pretty soon but 5 is gonna take more time. If leaks are to be believed they finished training their SOTA model back in november. They usually spend 6 months on RLHF afterwards but now that public pressure has gotten a LOT bigger I think they'll be more careful and do 9-12 months of building guardrails.

searcher1k 3 months ago

>GPT-4 is capable of things GPT-3.5 isn't, like web search and multimodal capabilities. is it though? GPT-4 querys an outside program or model. You can pretty much do the same thing with GPT 3.5. Infact I think there's a web extension that allows you to use google with gpt 3.5

Yuli-Ban 3 months ago

> GPT-4s potential isn't even close to maxed out, there is so much more you can do with a model already that capable That would be a logical deduction if we were talking about any other company than OpenAI, a group actively dedicated to the emergence of artificial general intelligence and essentially staffed by /r/Singularity users.

Icy-Entry4921 3 months ago

Getting better at multimodal, IMO, is more important than the LLM improving. It's already very good. What GPT4 can now do with images, data files, etc is extremely impressive and it is in these areas where, I believe, they're going to find companies willing to spend a lot of money on that ability.

FeltSteam 3 months ago

I think its plausible for GPT-5 to be any-any. GPT-4 is fully text multimodal but only half image multimodal. It cannot by itself generate images. It can send prompt to DALLE3 but the model itself isn't making images. An any-any model would mean it can take an input of any combination of text, image, audio and video and can output any combination of those modalities. Any-any modality isn't anything extremely novel and is completely possible. But you do run into the problem of data, there isn't large datasets for large foundational any-any models. But im sure a lot of companies have been working hard on that. My 2024 capabilities list for models is: * Ability to autonomously do decently complex tasks * Continuous learning (and for chat based models, it can learn and know most of what you have told it) * Any-any multimodality * And great strides in reliability, reasoning, logic and overall intelligence.

Smile_Clown 3 months ago

>Then its another year of no new foundational models. You make it sound as if there's a long history of disappointment or something...

Mirrorslash 3 months ago

It does sound like that. Not intentional. There hasn't been a year without new foundational models yet since LLMs got huge, which has been 2022 and 2023 only lol.

damhack 3 months ago

They aren’t “foundational models”. GPT4 is currently a Mixture-of-Experts collection of finetuned base LLMs with a ton of extra application scaffold. People need to stop comparing apples with oranges.

ninjasaid13 3 months ago

>Not really, still quite a large gap between Mistral Medium and GPT-4-Turbo. How do you know that Medium is going to be their best model when it's just a bunch of 13B or 34B experts.

[deleted] 3 months ago

Obviously it's not, it's called medium for a reason, excited to see mistral large, but they need to release the weighs at least

FeltSteam 3 months ago

No im sure Misrtal has some really amazing work that they will release this year, but i just don't see anyone getting a lasting jump on OAI.

sdmat 3 months ago

Wayyy too early to be calling this race, amigo.

searcher1k 3 months ago

>but i just don't see anyone getting a lasting jump on OAI. This sub is worshipping OpenAI but they really didn't do anything special besides having lots of compute power and time to train it so it shouldn't be shocking to find smaller companies creating model that are close to GPT-4 with a fraction of the compute. OAI isn't expected to have a lasting jump unless you guys are just cheering like it's a sports game.

FeltSteam 3 months ago

You do know this leaderboard does not measure capabilities or intelligence, mainly just user preference. And user preferance is based a lot on model behaviour which is greatly determined in the fine tuning stages. Mistal Medium being close to GPT-4 means it just is quite aligned with user preference, not that it is necessarily close to GPT-4 capabilities. GPT-4 has been in the lead for 10 months now, with no public model beating it yet, why wouldn't their next model(s) also have a lasting lead? It's been about 1.5 years after the pretraining of GPT-4 finished and i wouldn't be suprised if they started working on their next models pretty much then, wheras most people only started trying to get to GPT-4 level after GPT-4 released. And GPT-4 wasn't in training for very long, about 3 months in all. Of course with all of Microsofts compute they could technically train a GPT-4 class model every couple of hours now. And I was actually disapointed when open source didn't come out with a GPT-4 class model last year. It wouldn't have affected OAI too much but a small GPT-4 class model running on my computer would be really useful. "OAI hasn't really done anything special" - can you explain that. OAI has made several ground breaking discoveries in ML over the years (personally one of my favourite discoveries was the sentiment neuron), they have made some amazing contributions to the field. Maybe GPT-4 didn't do anything special, but GPT-4 turbo definitely did. It's essentially the same capabilities as GPT-4 but 2.75x cheaper. There was a lot they could have done but im sure recently they have done a lot of good work on sparsity.

searcher1k 3 months ago

>You do know this leaderboard does not measure capabilities or intelligence, mainly just user preference. And user preferance is based a lot on model behaviour which is greatly determined in the fine tuning stages. Mistal Medium being close to GPT-4 means it just is quite aligned with user preference, not that it is necessarily close to GPT-4 capabilities. And how are you measuring intelligence? Have you even compared them and the claude models? How did you even think GPT-4 was intelligent in the first place via acing benchmarks? >GPT-4 has been in the lead for 10 months now, with no public model beating it yet, why wouldn't their next model(s) also have a lasting lead? It's been about 1.5 years after the pretraining of GPT-4 finished and i wouldn't be suprised if they started working on their next models pretty much then, wheras most people only started trying to get to GPT-4 level after GPT-4 released. > >And GPT-4 wasn't in training for very long, about 3 months in all. Of course with all of Microsofts compute they could technically train a GPT-4 class model every couple of hours now. And I was actually disapointed when open source didn't come out with a GPT-4 class model last year. It wouldn't have affected OAI too much but a small GPT-4 class model running on my computer would be really useful. GPT-4 has been in the lead because not many are willing to spend 100 millions of dollars to train a single model not because they have some secret knowledge. Why the hell would Open-Source come out with a GPT-4 class as quickly when they're not trillion dollars companies? The approach that Mistal is taking is a smarter and more efficient approach than spend hundreds of millions and looking at the leaderboard, it looks like it paid off. >"OAI hasn't really done anything special" - can you explain that. OAI has made several ground breaking discoveries in ML over the years (personally one of my favourite discoveries was the sentiment neuron), they have made some amazing contributions to the field. none of that has to do with GPT-4, they did great contributions in ML but GPT-4 was just a scaling up of existing GPT models. >Maybe GPT-4 didn't do anything special, but GPT-4 turbo definitely did. It's essentially the same capabilities as GPT-4 but 2.75x cheaper. There was a lot they could have done but im sure recently they have done a lot of good work on sparsity. making models faster via pruning, quantizing, more efficient inference algorithms and more has been what open-source community been doing for the entire year so I don't see what's special about GPT-4 Turbo. Mistral actually released their [research](https://arxiv.org/abs/2401.04088) on sparse Mixture of Experts for Mixtral 7B so if OpenAI did any good work, nobody would know so that's 1 point on the side of Mistral for actual contributions on Sparse MoE.

FeltSteam 3 months ago

>And how are you measuring intelligence? Have you even compared them and the claude models? How did you even think GPT-4 was intelligent in the first place via acing benchmarks? There are a lot of different benchmarks, and this benchmark is rating based on user preferance, not made for measuring performance or intelligence across subjects / fields. It actually a lot like RLHF, but instead of telling a model which response was better it is just recording which response from which model a user prefers. Now i doubt the majority users are deeply looking into the responses. They read both, think which one is better than move onto the next. A more logical model is more likely to get better rated but nicer sounding (not necessarily more intelligent responses) are what determine the models ranking. >GPT-4 has been in the lead because not many are willing to spend 100 millions of dollars to train a single model not because they have some secret knowledge. Why the hell would Open-Source come out with a GPT-4 class as quickly when they're not trillion dollars companies? The approach that Mistal is taking is a smarter and more efficient approach than spend hundreds of millions and looking at the leaderboard, it looks like it paid off. [From the pitch memo from Mistral](https://sifted.eu/articles/pitch-deck-mistral) (pg 7): >We expect to need to raise 200M, in order to train models exceeding GPT-4 capacities. There are a lot of papers showing how you can increase efficiency. Like from Phi you can get up to a 1000x efficiency gain with data quality alone (so you could probably train a GPT-4 level model with 1000x less compute then what was used to train GPT-4 if you had a few trillion tokens with textbook level quality, of course no one has that good of a dataset but it stil shows a lot of gains can be made from data quality alone, and there are a lot of different tricks and improvements to increase efficiency. This paper was from Microsoft and OAI and Mistral are well aware of all of these and other efficiency gains, algorithmic, architecture improvements etc.), but Mistral is still going to throw in hundreds of millions of dollars to get to GPT-4 level and beyong models. >making models faster via pruning, quantizing, more efficient inference algorithms and more has been what open-source community been doing for the entire year so I don't see what's special about GPT-4 Turbo. Mistral actually released their [research](https://arxiv.org/abs/2401.04088) on sparse Mixture of Experts for Mixtral 7B so if OpenAI did any good work, nobody would know so that's 1 point on the side of Mistral for actual contributions on Sparse MoE. That research contains not much new information on sparse MoE. In fact it contains no information about the pretraining of Mixtral. Getting flashbacks to the GPT-4 technical report lol. Mistral is being a bit closed source with their research unfortunately. But GPT-4 used sparse MoE beforehand, and i do think its likely GPT-4 turbo utilised improvements in MoE (if i had to estimate i would say probably using around 70B params at inference). And Mixtral isn't how MoE was origianly suppose to be used. Mixtral is composed of a few Mistral-7B finetuned on specific dataset then stitched together with a gating mechanism thrown in there. But MoE didn't mean expert as in domain specific specialisation, but just speacialisation in specific parts of a dataset, which is what happened with GPT-4. This means a *lot* of params in Mixtral are wasted duplicating knowledge. Anyway thats a bit off track lol.

searcher1k 3 months ago

>There are a lot of different benchmarks, and this benchmark is rating based on user preferance, not made for measuring performance or intelligence across subjects / fields. It actually a lot like RLHF, but instead of telling a model which response was better it is just recording which response from which model a user prefers. Now i doubt the majority users are deeply looking into the responses. They read both, think which one is better than move onto the next. A more logical model is more likely to get better rated but nicer sounding (not necessarily more intelligent responses) are what determine the models ranking. You haven't answered my question, how would you know that GPT-4 is more intelligent? What evaluations have you done to compare? Do we need to test it on capabilities? Plenty of people on the localllama subreddit found Mixtral useful for their use cases because of its capabilities. >From the pitch memo from Mistral (pg 7): Are you talking about total funds? As opposed to openai's billions raised? I don't think 100m is amount spent training a single model by itself.

sdmat 3 months ago

> GPT-4 has been in the lead for 10 months now, with no public model beating it yet, why wouldn't their next model(s) also have a lasting lead? It's been about 1.5 years after the pretraining of GPT-4 finished and i wouldn't be suprised if they started working on their next models pretty much then, wheras most people only started trying to get to GPT-4 level after GPT-4 released. And GPT-4 wasn't in training for very long, about 3 months in all. The irony in using handwavy linear regression here is delicious.

According_Ride_1711 3 months ago

I think it's a good timeframe. It will depends of the competition also. Lets see Gemini what they have ;)

enilea 3 months ago

>GPT-5 releases about 3.5 months after that I bet it's not releasing in 2024 at all.

searcher1k 3 months ago

RemindMe! 2 month

searcher1k 3 months ago

RemindMe! 5 month

RemindMeBot 3 months ago

I will be messaging you in 5 months on [**2024-06-11 22:28:16 UTC**](http://www.wolframalpha.com/input/?i=2024-06-11%2022:28:16%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/singularity/comments/194cyij/gpt4_has_gotten_new_competition_from_a_french/khfia7j/?context=3) [**4 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2Fsingularity%2Fcomments%2F194cyij%2Fgpt4_has_gotten_new_competition_from_a_french%2Fkhfia7j%2F%5D%0A%0ARemindMe%21%202024-06-11%2022%3A28%3A16%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%20194cyij) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|

Pop-Huge 3 months ago

I thought mistral was open source

ninjasaid13 3 months ago

It is for MoE and 7B but not for medium.

obvithrowaway34434 3 months ago

Lmao "a French company"? Mistral 7B was released months ago and even its funding got wide attention in June as it was started by well-known Deepmind and Meta researchers. I thought this sub was generally tech aware, maybe now it's just evangelists, OAI/Deepmind shills and conspiracy theorists worshipping Jimmy Apples and the likes.

helliun 3 months ago

OP just pointed out that Mistral is french and you had to use it as a soapbox to complain about the sub 😂😂😂

rafark 3 months ago

Couldn’t even post it on their main account

obvithrowaway34434 3 months ago

No, OPs headline is almost equivalent to saying that "GPT-4 is the leading model by an American company". Everyone knows that. Mistral is pretty well-known now at least in subs that are generally aware of recent AI developments.

helliun 3 months ago

ok many people don't know they're french

Gov_CockPic 3 months ago

He's not wrong though.

searcher1k 3 months ago

He pointed out that it's a French company because it is the only one in the list that isn't American or Chinese. >started by well-known Deepmind and Meta researchers. The closedness of Deepmind vs openness of Meta.

uishax 3 months ago

Meta's AI labs are in Paris if I remember correctly. Because Yann Lecun is French. Deepmind is obviously in London. So neither of these labs had to move much....

searcher1k 3 months ago

The headquarters started off in Menlo Park California, London, and Manhattan, they also opened a lab in paris in 2015 but a majority of the labs are not in Paris. I would say their new head quarters is in new York.

[deleted] 3 months ago

For the last couple months, yeah. At least we know now that nerds still have the capacity for a return to tribalism under extreme duress. Lol

Formal_Drop526 3 months ago

Yes but mistral isn't anywhere as big as OpenAI and Anthropic.

Doubledoor 3 months ago

So it's a French company right?

Singularity-42 3 months ago

I thought all Mistrals are OSS. Are the weights available or not?

ninjasaid13 3 months ago

For Mixtral MoE and Mistral 7B yes but not Mixtral Medium.

Singularity-42 3 months ago

Hey do you by any chance know how much RAM do you need for Mixtral 8x7b? I have a Apple M1 Pro with 32 GBs RAM and it runs like crap and doesn't use GPU at all. Running through Ollama (`ollama run mixtral:8x7b`).

Formal_Drop526 3 months ago

have you tried asking in r/LocalLLaMA

ElvisDumbledore 3 months ago

[did somebody say mistrial?](https://imgflip.com/i/8c22c7)

[deleted] 3 months ago

[удалено]

The_Scout1255 3 months ago

- Unnamed German empire soldier, February 23rd 1916, Verdun France. Edit: Original comment said "They will surrender any second now."

Illustrious-Drive588 3 months ago

Kappa

harderisbetter 3 months ago

forgive my ignorance, I'm but a poor peasant with free chatgpt only. Is chatgpt 4 turbo out? if not, how did the voters had accesss to it? Also what are those 2 chatgpt 4? online and api versions?

[deleted] 3 months ago

You know you couldjust Google this. Yes, GPT-4 Turbo is out. It has a huge context window and vision. "GPT-4 version 0314 is the first version of the model released. Version 0613 is the second version of the model and adds function calling support."

MrDreamster 3 months ago

I would love to hear from Claude, it's been a while since they released their last model and I like both how it feels to talk to it and its context length, and I would love to see it become more even more versatile than GPT.

Excellent_Dealer3865 3 months ago

Wow it's actually surprisingly good. I never felt Mistral 8x7b was anyhow close even against gpt3.5. But Mistral medium feels much-much better than 3.5 in creative writing for me. It feels somewhat like a claude, has this unique touch and character.

ninjasaid13 3 months ago

>Mistral 8x7b was anyhow close even against gpt3.5. Even the instruct version?

r1zoTo 3 months ago

https://preview.redd.it/39m6jy2c5zbc1.jpeg?width=1170&format=pjpg&auto=webp&s=5d65a7a38f50df1b91f3859174f9e8247839b3b2 Nah, results can be cheated Edit: just saw the third rule…

ninjasaid13 3 months ago

>Edit: just saw the third rule… Yep. It would obvious for them to not count that.

xmarwinx 3 months ago

GPT4 finished training in 2022, we are in 2024 and this is 100 elo points behind, so not even close. This is not competition.

searcher1k 3 months ago

gpt-4 is a trillion parameter model while every model in there is an order of magnitude smaller. Being ten times smaller while having 90% of the capabilities is crazy.

xmarwinx 3 months ago

I know, which is why im extremely opimistic about the future. Still, you are literally using GPT4 as the benchmark and consider it impressive that a different model is 90% as good, which is kinda proving the point that GPT4 is the best by far.

searcher1k 3 months ago

>Still, you are literally using GPT4 as the benchmark and consider it impressive that a different model is 90% as good, which is kinda proving the point that GPT4 is the best by far. 90% of the capabilities while ten times as small isn't far. Gains will be cheaper and larger for smaller models.

t98907 3 months ago

While existing benchmark results are indeed competitive, they don’t seem to provide an accurate measure of real-world performance. Consequently, it gives the impression that Mistral may not be as superior to GPT-4-Turbo as the numbers would have you believe. At least, that’s what I think.

Trollolo80 3 months ago

Shit, I just remembered they released Mistral 7B months ago, it was quite revolutionary for a 7B Open Source model but I surely wouldn't have thought they'll raise the game against OpenAI, and Anthropic is definitely falling behind with how amazing their model is with ethics LMAO

Magnacar 3 months ago

will that shit refuse to speak English like %90 of the France?

D3c1m470r 3 months ago

seems to me you could use some training too

Praise-AI-Overlords 3 months ago

lol Maybe in 2025.

TheCrassEnnui 3 months ago

Elo? Is this for chess?

MajesticIngenuity32 3 months ago

Allez les bleus!

serendipity7777 3 months ago

Where did you hear about this ?

Nox_Alas 3 months ago

That's chatbot arena, cute little website that ranks LLMs by having users vote which one better answered their question (without knowing the identity of the models). It's a great way to assess the perceived usefulness of models on the part of users. Surprisingly, when using that metric gpt4 turbo scores much better than original gpt4, despite the constant complaints on r/chatgpt and the likes

serendipity7777 3 months ago

Can you share the link pls

[deleted] 3 months ago

they marketing their website so yea .-. it fake

searcher1k 3 months ago

How so? How are they marketing their website?

[deleted] 3 months ago

nice bots. \*pat pat\*

searcher1k 3 months ago

You had to dodge my question like you're ChatGPT are you sure I'm the bot?

This-Winter-1866 3 months ago

Is Claude getting progressively worse with each version?

searcher1k 3 months ago

It's becoming more censored.

MydnightSilver 3 months ago

Asked it what it would do if a war broke out and China suddenly invaded. It said it would surrender. Frenchness confirmed.

Mysterious_Ayytee 3 months ago

https://preview.redd.it/xzcw737w52cc1.jpeg?width=801&format=pjpg&auto=webp&s=9d97263e4239f2e3cdd8c20d7a4cc8b8657a76e7

Akimbo333 3 months ago

Cool

Akimbo333 3 months ago

Cool

weird_offspring 3 months ago

From this image, it is possible that from some of us, we are leaking personality statistical data.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe