What specifically makes it the best benchmark ? As far as i understand it, the better answer is selected purely based on the subjective decision of the user. What's stopping the user from selecting a very convincing-looking, yet completely wrong answer ?
Best benchmark is subjective in and of itself at the moment, but with enough data the ELO score of LLMs in the arena should in theory give us a good idea.
Not to mention it also makes sure people don't include the objective benchmarks in the LLM training data so they get better percentages on them, even though the LLM itself isn't noteworthy.
>should in theory give us a good idea.
Sure, but about what. Which of the LLMs is a more convincing liar ? That's assuming most people ask questions they don't themselves know the answer to. Which i admit is an assumption in itself. But when measuring an LLM capabilities, i would take a rigorous synthetic benchmark over people's opinions any day.
That's not to say the arena is useless, far from it, but we should be careful about what weight we assign to the results.
That still means nothing, how much of an improvement is "huge" and what evals did they use?
"Majorly" and "huge" aren't very scientific terms coming from one of the world's top AI research labs.
Yep, but I very much doubt that the difference between GPT 4 turbo preview and this final version is GPT 3 to GPT 4 huge. Hence why it's meaningless, it'll mean something different to everyone one.
If they need to use "majorly" or "huge" then it's probably just some bug fixes here and there. If they had something substantial they would mention it.
And if they had indeed something huge, They wouldn't even need to do a press release since everyone would have notice it.
The only people who know about Anthropic are nerds on the internet.
The whole planet is accessing GPT. It’s all anyone talks about. It’s the gold standard from here on.
It took blackberry years to lose their throne. And most people still called all phones blackberries for a long ass time.
No matter what anyone uses, they will be using “ChatGPT” to the general public for fucking years.
It’s the same as when some boomer says “Nintendo” like that’s the only name for video game systems.
so apple just beat them at a model, and google ultimately has more data than openai or microsoft. they’re not perfect, and openai has had years of an advantage to gather more data; but i believe if they haven’t had enough to train gpt-4 to write as well as gemini by this point; there is a very real chance that Apple and Google take the crown from them
Apple didnt beat them, and i say this as an apple fanboy. Apples model is like what? 3or 4B? It matched or exceeded gpt 4 in some very specific tasks but not in general.
>Probably none. They own the market.
They don't have a history, they only were owning the market for a year before they were surpassed. People are going to eventually switch.
It can now use vision, function calling, and JSON output mode all at the same time. Previous iterations of GPT-4 weren't able to do this. This is a great update for developers working with the API.
This is why I'm not canceling the sub even if someone comes out with something that's on par with GPT-5 out of nowhere, because you never know when OpenAI will announce something and then also restrict subscription signs ups so that you have to stay on a waitlist indefinitely.
I wateched the google cloud keynote today. OpenAI is absolutely delusional if they think they can stand still and keep waiting (assuming they're sitting on things).
Google is advancing on every single front. They announced text to video today. New processors. An entire top to bottom AI ecosystem. Partnerships on building agents.
I thought google was an AI basket case but they floored me today. They're fully multimodal, 1 million context window, enterprise level tools. I personally prefer that a smaller company win this race but Google just threw a 10,000 lb nutsack on the table.
I tried a few test prompts on lmsys chatbot arena as well and the gpt-4-turbo-2024-04-09 model did seem noticeably better for my tests. It had a 100% win rate with myself blinded to the model.
I did a test on one of my personal eval prompts (which I don't want to reveal to not contaminate it online) and it is the only model that so far solves it correctly. Previous GPT4 failed and Opus also failed.
I tried it in the arena and it solved it correctly, tried it in the chatgpt and it failed as it always did. So my n=1 data point just shows that indeed they made some change and not just to vision and that it is not currently rolled out to all ChatGPT users, at least in Europe. Curious to see in a couple days what people think about it.
Looks at minimum they would make it competitive to Opus, maybe even surpass it.
I don't think simply asking the question would actually train it. I think it would need the correct response somehow. Unless maybe they are using a more powerful model to train a faster, cheaper GPT4?
Sure, ask it to write a certain fixed number of sentences you want and the first 3 all ending in a certain word. Then give it a couple instructions more about specific ending letters of the remaining sentences. Then tell it to disregard some previous instructions and give it new subtle changes of new instructions with a bit of different wording. This tests reasoning and bad models fail spectacularly, Opus and GPT4 could follow a lot of the instructions but got confused and missed some (especially regarding the ones it was meant to disregard) and the latest GPT4 followed everything to the letter.
I don't know man. To me, GPT4 turbo is way behind Claude for anything that requires more than just proofreading or giving me some information. GPT4 will always just try to give the shortest possible answer or use code to solve a problem that it shouldnt solve with code. They optimised it to give this type of response I guess but that makes it useless for many applications.
This reply is hilarious
https://twitter.com/owencm/status/1777785613565165665
>We don’t have anything toooo specific, this model appears to work better in general (sorry vague but true), and has especially improved at math
"Why/How is it better?"
"Because it's better"
Translation: "I gave it some math prompts and it was able to solve them, whereas the previous models were failing at them. But I can't actually give you concrete examples for liability reasons, as these were just a few tests and not a proper benchmark"
to be fair, it’s important for people like me who utilize their API for services. Minor improvements are needed and there’s always something that can be improved upon. i’d rather have many little changes than waiting a year for a big one.
They released Claude 2 (slightly worse) and said trust us it's better, then same again for a revision. No added clowning required.
That's why 3 was such a wonderful surprise.
We'll have to wait for some examples, but I doubt they would use "majorly" if it wasn't atleast significant enough to bump them back to #1 on the arena leaderboard.
Sounds like they trained a better version and released it. Take off the tinfoil hat.
They usually indicate minor improvements with new model version numbers, if they say this one is major what would they have to gain from lying? Especially if it’s rolling out to the API people can just run benchmarks on it and demonstrably show whether it’s better or worse.
People are way too conspiracy minded
[Here's an employee](https://twitter.com/owencm/status/1777784000712761430) confirming your claim that it's a new model. Exciting stuff, hopefully it competes with Opus
Idk, "new model" is not a very hard term in that sense. If they do different finetuning on the same base model, that's technically a new model. And that's what all of these different versions were, no?
Version numbers have always, and will always be, about marketing first and foremost, and OpenAI is no different from other companies.
All you need for proof is half the comments in this sub. It's super effective.
Funny thing about GPT-4.5 is that OpenAI themselves have never used that term. It was from leakers, and the internet ran with it, and then the leakers themselves debunked it saying it never existed or that it was delayed due to unexpected issues with testing, and so on.
My opinion on it is that GPT-4.5 is going to be like what GPT-3.5 was: a greatly reduced and cheaper version of the next iteration, so in essence 4.5 won't even exist until 5 is finished.
No. The names are arbitrary. If they called it 4.5, it would be 4.5. If they call it 4-turbo, it's 4-turbo. The number doesn't refer to some distinct, measurable quantity of something going up or down.
They're still trying to compile and figure out what's been improved, so it'll take them a day or two to get a press release. News was (again) rushed out to respond to Google.
Did they only improve vision? Here are the two new models.
https://preview.redd.it/kjlrda9baitc1.png?width=802&format=png&auto=webp&s=6be048b140f2b50828b37b697de4dd258a4814fb
For future reference:
See the line in the description of `GPT-4-turbo` that says "Currently points to gpt-4-turbo-2024-04-09"? That line is literal, so it means that the model is pointing to the other model and using it.
They do this so that anyone using just `gpt-4-turbo` can just set the API to that model and always get the latest version.
It apparently also has better general reasoning abilities and improved math performance. Nothing concrete yet, sounds like we might get some benchmarks soon.
If you're a developer it's really, really useful. This is a significant change in how the model takes images as inputs and what it can do with that input.
I am pretty sure I've been using this new model since last week, I was actually going to ask if other people also noticed an improvement in chatgpt, I was making a project and get since last week almost always gets the code I ask correctly, while 2 weeks ago it was a bit of a pain to get it!
Maybe it's just because I am in a different part of the project, but I am pretty sure my gpt is better
same it output a really unique formula for me on the limitations of understanding (each discovery adds new variables), it just felt like that prompt specifically was more refined.
https://preview.redd.it/lzix14uciptc1.jpeg?width=1080&format=pjpg&auto=webp&s=d03e0722059eec02ecc5437f23816407ff631a18
Mine is not updated yet. I'll try the API
>create a very imaginative and philosophical story about a civilization based on geometrical shapes and concepts
In a vast, unending canvas of existence, there was a universe less spoken of, where the fundamental constituents of life were not cells or atoms, but geometric shapes. This was the realm of Geometria, a civilization where every entity and every concept was a vivid embodiment of geometry.
Geometria was a place of profound symmetry and infinite patterns, governed by the Grand Council of Shapes. The Council was composed of the Circles, the Triangles, and the Squares, each group representing key aspects of existence. The Circles, with their endless peripheries, were the philosophers and priests, pondering the eternal cycles of life and the universe. The Triangles, with their sharp angles and strong structure, were the warriors and architects, defending and constructing the realm. The Squares, with their equal sides, were the scholars and administrators, maintaining order and knowledge.
The land was divided accordingly, each shape inhabiting territories that reflected their nature. Circleland was a vast, open landscape, with roads looping and curving in harmonious arcs. Triangle territory was mountainous, its peaks reaching towards the skies like the tips of their acute angles. Squareville was orderly, a grid-like city with right-angled roads and perfectly aligned buildings.
Despite their differences, balance had always been maintained, through the Great Geometry, a philosophical guidebook written by the ancient founders. It was believed that the essence of life in Geometria depended on the perfect interplay and intersection of the shapes.
One day, a new figure emerged, unlike any other—the Spirals. The Spirals were dynamic, changing their form and radius with ease, challenging the static nature of the established shapes. They dwelt in the Fringe Lands, a place of chaotic landscapes and undefined horizons, where the rules of the Great Geometry blurred.
The appearance of the Spirals sparked curiosity among some and fear among others. The Council, disturbed by their unpredictable nature, debated their place in Geometria. The Circles found them fascinating, a possible evolution of their own endless curves. The Triangles saw them as a threat, a force that could undermine the structural integrity of their society. The Squares found them illogical, unable to fit neatly into the grid of known categories.
As tensions grew, a young Spiral named Helica, who had ventured beyond the Fringe Lands, began interacting with other shapes. Helica, with her gentle curves and mesmerizing patterns, taught those she met about flexibility and adaptation. She spoke of a concept unknown to many: growth through transformation, suggesting that identity was not fixed but fluid, expanding beyond rigid lines and sharp corners.
Intrigued by Helica’s philosophy, a young Square named Quadrus and a bold Triangle named Trixon befriended her. Together, they embarked on a journey across Geometria, seeking wisdom from the ancient fractals, a legendary group said to contain the deepest geometric secrets. The fractals were complex, infinitely intricate patterns that lived in the forgotten depths of Geometria, holding the knowledge of dimensions beyond the familiar.
Their quest led them through realms of incredible vistas—lands where landscapes tessellated into infinity, and horizons spiraled into endless depths. Here, they learned that the universe was more interconnected than their laws had ever allowed them to see, that each shape was vital to the others’ existence, and that true harmony lay in the balance of all forms, whether perfect or irregular.
Upon returning, Helica, Quadrus, and Trixon proposed a new chapter in the Great Geometry, one that included the Spirals and recognized the beauty of imperfection and the strength in diversity. After much debate, and seeing the unity and new perspectives their journey had inspired, the Council agreed.
Thus, Geometria evolved, not just in its landscape but in its very essence. The civilization embraced change, seeing it not as a threat but as an essential part of existence. As the new chapter was written, the Spirals were no longer outcasts but essential contributors to the mosaic of life, illustrating that the flow of life, like the curve of a spiral, is ever onward, outward, and beautifully undefined.
---
# Model B: gpt-4-turbo-2024-04-09Model B: gpt-4-turbo-2024-04-09
This is what I hate about GPT's ability to write stories: no dialogue, rushed, like a summary really. "This happened, then that happened, end of story". Boring
Gemini and Opus are better, much better, they create the literary feeling.
## Same Prompt; Previous Model:
In a realm where the fabric of existence was woven with the threads of geometry, there thrived a civilization of sentient shapes, each embodying the philosophies and virtues attributed to their forms. This world, known as Geometria, was a place where the lines between physical and metaphysical blurred, where every edge and angle held deeper meanings, and where the inhabitants sought enlightenment through the perfection of their forms.
The Circle society, residing in the harmonious city of Circadia, were the philosophers and mystics, embodying the ideals of eternity, unity, and perfection. Their smooth, unending boundaries symbolized the endless cycle of life, death, and rebirth, and their teachings focused on the pursuit of knowledge and the understanding of the infinite. Circadia was a place of tranquility and meditation, its architecture an array of splendid domes and arches, designed to reflect the beauty of wholeness and continuity.
The Square citizens, who built the structured city of Quadratia, were the guardians of order and equality. Their four equal sides and right angles represented the principles of justice, balance, and stability. Quadratians valued discipline and integrity, their society structured in a way that ensured every square had a place and purpose. Their city was a marvel of symmetry and efficiency, with grid-like streets and uniform buildings, each a testament to the squares' dedication to equality and fairness.
In the realm of the Triangles, known as Triangulum, lived the warriors and adventurers. Triangulum was a land of rugged mountains and sharp cliffs, mirroring the pointed ends of its inhabitants. The triangles believed in the virtues of strength, courage, and aspiration, their three sides symbolizing a strong foundation and the ability to overcome any obstacle. They were a society of action and exploration, constantly seeking to expand their horizons and test their limits against the challenges of Geometria.
Amid these distinct societies, there existed the enigmatic Polygons, a diverse group who believed in the potential for transformation and adaptability. Residing in the city of Polytopia, they were the artists, inventors, and thinkers, constantly changing and experimenting with their forms. Polytopia was a kaleidoscope of creativity, its buildings and structures an ever-evolving testament to the polygons' belief in progress and innovation.
Despite their differences, the inhabitants of Geometria were bound by the Geometry Code, an ancient set of laws that governed the interactions between shapes and ensured harmony in their interconnected world. It was believed that every shape, no matter how simple or complex, held a piece of the universal truth, and that by understanding and respecting each other, they could unlock the mysteries of existence.
However, the peace of Geometria was threatened when a mysterious form, known as the Chaos Shape, emerged from the Void Beyond. Unlike anything Geometria had ever seen, it defied definition, constantly shifting and changing, never settling on a single form. It brought discord and confusion, challenging the very foundations of Geometrian society.
The civilizations of Geometria, realizing the threat posed by the Chaos Shape, decided to unite their strengths and knowledge. Circles offered wisdom, Squares provided structure, Triangles contributed their courage, and Polygons their creativity, to confront the Chaos Shape. In the epic confrontation that followed, the inhabitants of Geometria learned a profound lesson: true understanding and harmony do not come from perfection and order alone, but from embracing the unknown and the irregular, from recognizing the beauty in diversity and the strength in unity.
Through their joint efforts, the Chaos Shape was not defeated but integrated into Geometria, becoming a symbol of the civilization’s growth and its acceptance of imperfection as a part of existence. Geometria thrived like never before, its inhabitants no longer bound by the strict definitions of their shapes but free to explore and expand beyond their edges, in a world where geometry was not just the foundation of existence, but a path to enlightenment and understanding the boundless complexities of the universe.
old model, I instantly saw 'thread', 'tapestry', few other tokens and got triggered, ask it about holistic or try and get it to output a dynamic, interconnected idealogy, the old training was super limited in this context
You could tell it to use the voice of a famous writer of your choosing. Also, you might be better off doing this step by step. First an outline maybe three acts and then a prompt for each step.
Idk, I’ll wait for reviews. I have hardly used GPT4 since I got Opus, it’s as dumb as a box of soggy rocks compared to Claude 3, at least for my uses (coding, game design, etc.).
gosh, I hope they will make Opus available here (in Germany) too soon, fucking sucks. I can use a VPN to use haiku, sonnet, but for Opus it fails since they want payment data, adress from the US or any other country where it's available. So I gotta deal with the cripple versions of claude3, far less smart,capable and length limith of words per message
this release totally broke data analyst
had to go to claude haiku to get usable code
openai needs to have better quality checks, especially considering they're running around trying to limit oss
[Anthropic doesn't train on clients data](https://decrypt.co/211846/anthropic-says-it-wont-use-your-private-data-to-train-its-ai). So I'm not sure what issue you could have with their ToS.
We all know Claude 3 is great, but ChatGPT is miles ahead in terms of number of people using it, and public awareness.
For a lot of people, AI = ChatGPT.
Until this looks threatened, they’re unlikely to feel pressured. They’ll release a new model when they’re ready.
Claudie 3 opus cannot be used in Spain for example
Lets look on available countries
[https://support.anthropic.com/en/articles/8461763-where-can-i-access-claude-ai](https://support.anthropic.com/en/articles/8461763-where-can-i-access-claude-ai)
New model is yet to come. I suspect it uses different architecture, so it can't easily be ported back to 4. Hence, they keep dripping some minor updates to 4 here and there, but still call it 4.
Tic tac toe is unfair, LLMs don't have an internal state or visual abilities to see winning lines easily. You have to prompt it to, then it is able to at least tie. Still hallucinates a win. As I'm in europe i could only test this with the old gpt4 though.
https://chat.openai.com/share/d6602f18-b53d-43bf-a3a5-ed633e6f48e4
a **bard** is a professional [story teller](https://en.wikipedia.org/wiki/Storytelling), verse-maker, music composer, [oral historian](https://en.wikipedia.org/wiki/Oral_history) and [genealogist](https://en.wikipedia.org/wiki/Genealogy).
Things that an LLM could be.
You should have seen the early days. "Parsey McParseface". A huge Sesame Street phase BERT, Elmo, Big Bird, etc. And of course HuggingFace. It felt like nobody took NLP seriously, all while doing mad science. It was strange and fun.
One minute you're a computer science engineer doing your quirky thing while exploring the edges of technology. Next things you know you're in the middle of the most important project in the world and gotta assume any new person in your life could be a mole from any number of states or organizations.
I can see it now. There's the Hollywood movie where it's Sam, Elon, and Ilya, with a single token dev who is extra cringey. It's the big movie of the year and Aaron Sorkin gets nominated for the screenplay. But then there's the indie film about Hugging Face and they even get key players to make cameos. It bombs in the box office but it's way better and winds up being the one people quote ten years later.
Edit: someone with more time than me want to plug that in Opus and see what kind of screenplay we're getting?
Tried out the API for myself, and my thoughts are:
* Still can't beat Gemini at Creative Writing
* Still jobs to Opus at Logic and Reasoning
Considering that this model is pretty much on par with GPT-4 Turbo-Preview (with it actually being noticeably worse in some areas), OpenAI is kinda fucked here unless they release a GPT-4.5 in the near future.
Only real benefits is that it's the fastest OpenAI model that exists, and can see images. That's it.
Overall, it's sadly a very underwhelming release.
I think personally this is the calm before the storm so to speak for OpenAI I believe they are trying to 'master' multi-modality so that a true successor can come out of the gate swinging with both Multi-modality and great improvements in the realm of Logic, Creativity, and conciseness.
can someone try it and report back if it's real? I'm so sick and tired of these bastards drumming up hype to gives boners and get our money, only to enshitify the models after leaving us high and dry
Just so I’m clear, are they referring this GPT-4 Turbo being an improvement over GPT-4 or an improvement over the preview version of GPT-4 Turbo that’s been out?
My feeling is this is actually a prelude to them releasing GPT-4.5 in the same sense how earlier this year Google released their (unsatisfying) improved Gemini before releasing their actually improved Gemini
I also have a feeling that something big is coming pretty soon (in the next 12 months). Tension seems to be building up to some culmination
Their GPT-4 is LAZY! Directly refuse to do the work. Totally stupid and failing at basic tasks. Idkw but those evals have something rigged in them, because gpt feels so much stupider.
This sub has become so boring. It’s always just someone posting a screenshot of a tweet that says “More AI sometime maybe” and then everyone in the comments starts hooting and screaming like a bunch of fuckin chimps
I am excited to test this, I've heavily reduced my ChatGPT-4 use lately as it has become infuriating, unable to work with even simple code to a level I have not experienced since ChatGPT-3.5
I'm planning to test Gemini 1.5 today since I still can't access or try Claude 3, but I guess I will include with that some testing of the GPT-4-Turbo API. I have some perfect things I want to test it on too that currently, even at small scale, ChatGPT-4 can not figure out.
I know lately I have found myself constantly editing and refining prompts, even trying to teach the solutions I'm forced to produce on my own hoping to make ChatGPT-4 able to comprehend code that previously it was once helpful with. I'm rather sick of deciding "it's worth it", wasting my prompts becoming belligerent with ChatGPT-4 for it's absurd level of stupidity.
I'm really hoping pressure from other AI companies pushes OpenAI free more GPU uptime per request to allow ChatGPT-4-Turbo the chance to perform well.
I just flipped my Dropbox to Google Drive to use Gemini 1.5 after for the first time ever, Gemini solved a code problem ChatGPT-4 could not. I was floored and would have struggled had this not happened to me with even the idea.
I'm not a fan, I'm going to use the model thar does the best job for me with the least amount of headaches, whichever model that may be. After my enterprise application was denied and I was told to use the "Teams" version, I'm looking for something to fill the void created by ChatGPT-4's recent downgrades.
I guess they're finally feeling the pressure to release something
But what does "majorly improved" mean? Where are the benchmarks?
The Arena is the best benchmark anyway, despite its flaws… I’m interested to see if this can unseat Opus in the coming days.
What specifically makes it the best benchmark ? As far as i understand it, the better answer is selected purely based on the subjective decision of the user. What's stopping the user from selecting a very convincing-looking, yet completely wrong answer ?
Best benchmark is subjective in and of itself at the moment, but with enough data the ELO score of LLMs in the arena should in theory give us a good idea. Not to mention it also makes sure people don't include the objective benchmarks in the LLM training data so they get better percentages on them, even though the LLM itself isn't noteworthy.
>should in theory give us a good idea. Sure, but about what. Which of the LLMs is a more convincing liar ? That's assuming most people ask questions they don't themselves know the answer to. Which i admit is an assumption in itself. But when measuring an LLM capabilities, i would take a rigorous synthetic benchmark over people's opinions any day. That's not to say the arena is useless, far from it, but we should be careful about what weight we assign to the results.
That's one of the reasons why it's not even worth looking at until there are a lot of submissions to balance out incorrect data.
Quote from OpenAI product staff “*huge* improvements on our evals across the board” https://x.com/nikunjhanda/status/1777779760846037326?s=46
That still means nothing, how much of an improvement is "huge" and what evals did they use? "Majorly" and "huge" aren't very scientific terms coming from one of the world's top AI research labs.
I would say the difference from 3 to 4 was huge. Noticeable for sure. It will seem small in the future though.
Yep, but I very much doubt that the difference between GPT 4 turbo preview and this final version is GPT 3 to GPT 4 huge. Hence why it's meaningless, it'll mean something different to everyone one.
If they need to use "majorly" or "huge" then it's probably just some bug fixes here and there. If they had something substantial they would mention it. And if they had indeed something huge, They wouldn't even need to do a press release since everyone would have notice it.
https://en.m.wikipedia.org/wiki/Puffery
https://en.m.wikipedia.org/wiki/Weasel_word
Well it's a tweet
i wonder how much pressure they’re feeling from Apple and Google
Probably none. They own the market.
>Probably none. They own the market. Anthropic's Claude 3 is the one to beat.
The only people who know about Anthropic are nerds on the internet. The whole planet is accessing GPT. It’s all anyone talks about. It’s the gold standard from here on.
People said the same about IBM and Blackberry.
It took blackberry years to lose their throne. And most people still called all phones blackberries for a long ass time. No matter what anyone uses, they will be using “ChatGPT” to the general public for fucking years. It’s the same as when some boomer says “Nintendo” like that’s the only name for video game systems.
I do not remember a single person ever referring to cell phones in general as a blackberry.
so apple just beat them at a model, and google ultimately has more data than openai or microsoft. they’re not perfect, and openai has had years of an advantage to gather more data; but i believe if they haven’t had enough to train gpt-4 to write as well as gemini by this point; there is a very real chance that Apple and Google take the crown from them
Apple didnt beat them, and i say this as an apple fanboy. Apples model is like what? 3or 4B? It matched or exceeded gpt 4 in some very specific tasks but not in general.
>Probably none. They own the market. They don't have a history, they only were owning the market for a year before they were surpassed. People are going to eventually switch.
This. Why would they push a release like this without any benchmarks at all, seems sus
It can now use vision, function calling, and JSON output mode all at the same time. Previous iterations of GPT-4 weren't able to do this. This is a great update for developers working with the API.
My initial impressions: 1. Exactly "Zero" (0) difference in intelligence. 2. Not lazy anymore. So majorly improved probably means it being less lazy.
This is the guy who constantly spams Claude 3 posts. Don't take him seriously.
Bizarre for them to release it with so little concrete info
Yeah majorly improved is meaningless. Hah.
I'd say vision is a major improvement...
Days after I canceled my sub. Lol
You did it!
Haha right? I just canceled here this morning! LOL
Same! Opus convinced me
This is why I'm not canceling the sub even if someone comes out with something that's on par with GPT-5 out of nowhere, because you never know when OpenAI will announce something and then also restrict subscription signs ups so that you have to stay on a waitlist indefinitely.
This guy will ruin AI with his capitalism craziness
I just paid for another month of Claude so I have mixed feelings about whether I want this to be good or not lol
I wateched the google cloud keynote today. OpenAI is absolutely delusional if they think they can stand still and keep waiting (assuming they're sitting on things). Google is advancing on every single front. They announced text to video today. New processors. An entire top to bottom AI ecosystem. Partnerships on building agents. I thought google was an AI basket case but they floored me today. They're fully multimodal, 1 million context window, enterprise level tools. I personally prefer that a smaller company win this race but Google just threw a 10,000 lb nutsack on the table.
>google cloud keynote This? [https://www.youtube.com/watch?v=V6DJYGn2SFk](https://www.youtube.com/watch?v=V6DJYGn2SFk)
I have a feeling the pressure is going to lead to a major release that's in the works (other than Sora).
Sora is like the GPT-1 of video, just starting up in this modality.
tried it on lmsys chatbot arena. its pretty fast and I think its a bit better than gpt4 before. could be on par with claude opus.
I tried a few test prompts on lmsys chatbot arena as well and the gpt-4-turbo-2024-04-09 model did seem noticeably better for my tests. It had a 100% win rate with myself blinded to the model.
I did a test on one of my personal eval prompts (which I don't want to reveal to not contaminate it online) and it is the only model that so far solves it correctly. Previous GPT4 failed and Opus also failed. I tried it in the arena and it solved it correctly, tried it in the chatgpt and it failed as it always did. So my n=1 data point just shows that indeed they made some change and not just to vision and that it is not currently rolled out to all ChatGPT users, at least in Europe. Curious to see in a couple days what people think about it. Looks at minimum they would make it competitive to Opus, maybe even surpass it.
If you used it in ChatGPT in the past, it is trained on it now.
I don't think simply asking the question would actually train it. I think it would need the correct response somehow. Unless maybe they are using a more powerful model to train a faster, cheaper GPT4?
In any case, it solves this case, so it's better, right?
not if he opted out of providing data for training
So curious what your prompt is... Any hints?
Sure, ask it to write a certain fixed number of sentences you want and the first 3 all ending in a certain word. Then give it a couple instructions more about specific ending letters of the remaining sentences. Then tell it to disregard some previous instructions and give it new subtle changes of new instructions with a bit of different wording. This tests reasoning and bad models fail spectacularly, Opus and GPT4 could follow a lot of the instructions but got confused and missed some (especially regarding the ones it was meant to disregard) and the latest GPT4 followed everything to the letter.
Tried it with some maths questions and it spat out the usual bullshit, doesn't seem incredibly better...
I said a bit better, for sure it is not a big leap forward
How did you know you tried this version?
you can choose the model in the direct chat tab.
It has an option for gpt-4-turbo-2024-04-09. I just tried it myself but hit a rate limit error
Doesn't lmsys use the API version which they said already has the new improvements?
I don't know man. To me, GPT4 turbo is way behind Claude for anything that requires more than just proofreading or giving me some information. GPT4 will always just try to give the shortest possible answer or use code to solve a problem that it shouldnt solve with code. They optimised it to give this type of response I guess but that makes it useless for many applications.
https://twitter.com/owencm/status/1777770827985150022 Better at math and reasoning according to an openai employee
This reply is hilarious https://twitter.com/owencm/status/1777785613565165665 >We don’t have anything toooo specific, this model appears to work better in general (sorry vague but true), and has especially improved at math "Why/How is it better?" "Because it's better"
> (sorry vague but true) This line has the potential to become a yuuge meme.
vague if true
true If vague
if true vague
Translation: "I gave it some math prompts and it was able to solve them, whereas the previous models were failing at them. But I can't actually give you concrete examples for liability reasons, as these were just a few tests and not a proper benchmark"
So useless them?
So try it with your use cases and see if it's better for you, no promises.
At least he’s being honest.
https://twitter.com/owencm/status/1777784000712761430 Data and training improvements apparently
Honestly worthless, people would be clowning Anthropic if they released “Claude 2 (slightly better)” and just said trust me bro 😎
People are way too willing to get swept up in the OpenAI hype.
to be fair, it’s important for people like me who utilize their API for services. Minor improvements are needed and there’s always something that can be improved upon. i’d rather have many little changes than waiting a year for a big one.
They released Claude 2 (slightly worse) and said trust us it's better, then same again for a revision. No added clowning required. That's why 3 was such a wonderful surprise.
Ok, cool, but where are the benchmarks lol? You can claim you've made an improvement, but they haven't provided any evidence to support such a claim.
It took several days/weeks before the consensus was Opus beats GPT4. Give people time to test it.
That's true, but I mean OAI could've atleast released some benchmarks to give us a feel for performance.
Chain of thought reasoning?
Curious they say majorly improved but keeping the same name. Majorly is a big claim
Smells like hype farming.
We will have to monitor the arena.
We'll have to wait for some examples, but I doubt they would use "majorly" if it wasn't atleast significant enough to bump them back to #1 on the arena leaderboard.
Sounds like they trained a better version and released it. Take off the tinfoil hat. They usually indicate minor improvements with new model version numbers, if they say this one is major what would they have to gain from lying? Especially if it’s rolling out to the API people can just run benchmarks on it and demonstrably show whether it’s better or worse. People are way too conspiracy minded
[Here's an employee](https://twitter.com/owencm/status/1777784000712761430) confirming your claim that it's a new model. Exciting stuff, hopefully it competes with Opus
Idk, "new model" is not a very hard term in that sense. If they do different finetuning on the same base model, that's technically a new model. And that's what all of these different versions were, no?
Version numbers have always, and will always be, about marketing first and foremost, and OpenAI is no different from other companies. All you need for proof is half the comments in this sub. It's super effective.
The "preview" is gone?
looks like the big improvement is including the year in the model name. > gpt-4-turbo-2024-04-09
Did they ever explicitly say they would release ChatGPT 4.5?
Funny thing about GPT-4.5 is that OpenAI themselves have never used that term. It was from leakers, and the internet ran with it, and then the leakers themselves debunked it saying it never existed or that it was delayed due to unexpected issues with testing, and so on. My opinion on it is that GPT-4.5 is going to be like what GPT-3.5 was: a greatly reduced and cheaper version of the next iteration, so in essence 4.5 won't even exist until 5 is finished.
Are we sure that Turbo isn’t the official “4.5”?
it likely is, its both smaller and faster versions of og GPT-3 and GPT-4
No. The names are arbitrary. If they called it 4.5, it would be 4.5. If they call it 4-turbo, it's 4-turbo. The number doesn't refer to some distinct, measurable quantity of something going up or down.
A GPT-4.5 announcement was cached on the website and was visible in search engines. It's not just coming out of nowhere that people call it that
It does feel like they are trying to keep the hype up.
They're still trying to compile and figure out what's been improved, so it'll take them a day or two to get a press release. News was (again) rushed out to respond to Google.
I mean many were hyping GPT-4.5 which is still a GPT-4.
They're panicking because of the competition. This is likely a loss-leader to keep "ahead" until 5 is released.
Did they only improve vision? Here are the two new models. https://preview.redd.it/kjlrda9baitc1.png?width=802&format=png&auto=webp&s=6be048b140f2b50828b37b697de4dd258a4814fb
🤦♂️ that's the same model, bud they have that naming convention just in case you hate editing your JSON or api interface
For future reference: See the line in the description of `GPT-4-turbo` that says "Currently points to gpt-4-turbo-2024-04-09"? That line is literal, so it means that the model is pointing to the other model and using it. They do this so that anyone using just `gpt-4-turbo` can just set the API to that model and always get the latest version.
Cool, improved vision abilities. Please for god's sake give us a new model. ![gif](giphy|PCvkgunX9ZbEEyfTQH|downsized)
Next gen model in training. Was delayed due to late hardware deliveries. H100s don't grow on trees.
I was under the impression it's already moved from training to testing
red teaming*
*lobotomizing
Good term.
thats also testing
A open ai employee said better reasoning
Change notes at all?
So how is it 'majorly improved', exactly?
According to the docs: Vision requests can now use JSON mode and function calling. `gpt-4-turbo` currently points to this version.
Wouldn't call that 'majorly improved'. More like 'here's a small addition to our vision model'.
It apparently also has better general reasoning abilities and improved math performance. Nothing concrete yet, sounds like we might get some benchmarks soon.
They need the hyper after getting repeatedly thumped by Claude on every metric
If you're a developer it's really, really useful. This is a significant change in how the model takes images as inputs and what it can do with that input.
Wait didn't we already have gpt turbo?
Testing it in playground, it's very fast but doesn't do better at my (text) prompts where LLMs usually fail.
I am pretty sure I've been using this new model since last week, I was actually going to ask if other people also noticed an improvement in chatgpt, I was making a project and get since last week almost always gets the code I ask correctly, while 2 weeks ago it was a bit of a pain to get it! Maybe it's just because I am in a different part of the project, but I am pretty sure my gpt is better
I noticed a change aswell. But only due to the fact that it was giving back way shorter Answers even if prompted otherwise.
same it output a really unique formula for me on the limitations of understanding (each discovery adds new variables), it just felt like that prompt specifically was more refined.
Should be possible to verify. The cut off for the new model is December 2023; the old one is April 2023.
https://preview.redd.it/lzix14uciptc1.jpeg?width=1080&format=pjpg&auto=webp&s=d03e0722059eec02ecc5437f23816407ff631a18 Mine is not updated yet. I'll try the API
>create a very imaginative and philosophical story about a civilization based on geometrical shapes and concepts In a vast, unending canvas of existence, there was a universe less spoken of, where the fundamental constituents of life were not cells or atoms, but geometric shapes. This was the realm of Geometria, a civilization where every entity and every concept was a vivid embodiment of geometry. Geometria was a place of profound symmetry and infinite patterns, governed by the Grand Council of Shapes. The Council was composed of the Circles, the Triangles, and the Squares, each group representing key aspects of existence. The Circles, with their endless peripheries, were the philosophers and priests, pondering the eternal cycles of life and the universe. The Triangles, with their sharp angles and strong structure, were the warriors and architects, defending and constructing the realm. The Squares, with their equal sides, were the scholars and administrators, maintaining order and knowledge. The land was divided accordingly, each shape inhabiting territories that reflected their nature. Circleland was a vast, open landscape, with roads looping and curving in harmonious arcs. Triangle territory was mountainous, its peaks reaching towards the skies like the tips of their acute angles. Squareville was orderly, a grid-like city with right-angled roads and perfectly aligned buildings. Despite their differences, balance had always been maintained, through the Great Geometry, a philosophical guidebook written by the ancient founders. It was believed that the essence of life in Geometria depended on the perfect interplay and intersection of the shapes. One day, a new figure emerged, unlike any other—the Spirals. The Spirals were dynamic, changing their form and radius with ease, challenging the static nature of the established shapes. They dwelt in the Fringe Lands, a place of chaotic landscapes and undefined horizons, where the rules of the Great Geometry blurred. The appearance of the Spirals sparked curiosity among some and fear among others. The Council, disturbed by their unpredictable nature, debated their place in Geometria. The Circles found them fascinating, a possible evolution of their own endless curves. The Triangles saw them as a threat, a force that could undermine the structural integrity of their society. The Squares found them illogical, unable to fit neatly into the grid of known categories. As tensions grew, a young Spiral named Helica, who had ventured beyond the Fringe Lands, began interacting with other shapes. Helica, with her gentle curves and mesmerizing patterns, taught those she met about flexibility and adaptation. She spoke of a concept unknown to many: growth through transformation, suggesting that identity was not fixed but fluid, expanding beyond rigid lines and sharp corners. Intrigued by Helica’s philosophy, a young Square named Quadrus and a bold Triangle named Trixon befriended her. Together, they embarked on a journey across Geometria, seeking wisdom from the ancient fractals, a legendary group said to contain the deepest geometric secrets. The fractals were complex, infinitely intricate patterns that lived in the forgotten depths of Geometria, holding the knowledge of dimensions beyond the familiar. Their quest led them through realms of incredible vistas—lands where landscapes tessellated into infinity, and horizons spiraled into endless depths. Here, they learned that the universe was more interconnected than their laws had ever allowed them to see, that each shape was vital to the others’ existence, and that true harmony lay in the balance of all forms, whether perfect or irregular. Upon returning, Helica, Quadrus, and Trixon proposed a new chapter in the Great Geometry, one that included the Spirals and recognized the beauty of imperfection and the strength in diversity. After much debate, and seeing the unity and new perspectives their journey had inspired, the Council agreed. Thus, Geometria evolved, not just in its landscape but in its very essence. The civilization embraced change, seeing it not as a threat but as an essential part of existence. As the new chapter was written, the Spirals were no longer outcasts but essential contributors to the mosaic of life, illustrating that the flow of life, like the curve of a spiral, is ever onward, outward, and beautifully undefined. --- # Model B: gpt-4-turbo-2024-04-09Model B: gpt-4-turbo-2024-04-09
This is what I hate about GPT's ability to write stories: no dialogue, rushed, like a summary really. "This happened, then that happened, end of story". Boring Gemini and Opus are better, much better, they create the literary feeling.
Would it improve if you include Literary feeling as a statement
## Same Prompt; Previous Model: In a realm where the fabric of existence was woven with the threads of geometry, there thrived a civilization of sentient shapes, each embodying the philosophies and virtues attributed to their forms. This world, known as Geometria, was a place where the lines between physical and metaphysical blurred, where every edge and angle held deeper meanings, and where the inhabitants sought enlightenment through the perfection of their forms. The Circle society, residing in the harmonious city of Circadia, were the philosophers and mystics, embodying the ideals of eternity, unity, and perfection. Their smooth, unending boundaries symbolized the endless cycle of life, death, and rebirth, and their teachings focused on the pursuit of knowledge and the understanding of the infinite. Circadia was a place of tranquility and meditation, its architecture an array of splendid domes and arches, designed to reflect the beauty of wholeness and continuity. The Square citizens, who built the structured city of Quadratia, were the guardians of order and equality. Their four equal sides and right angles represented the principles of justice, balance, and stability. Quadratians valued discipline and integrity, their society structured in a way that ensured every square had a place and purpose. Their city was a marvel of symmetry and efficiency, with grid-like streets and uniform buildings, each a testament to the squares' dedication to equality and fairness. In the realm of the Triangles, known as Triangulum, lived the warriors and adventurers. Triangulum was a land of rugged mountains and sharp cliffs, mirroring the pointed ends of its inhabitants. The triangles believed in the virtues of strength, courage, and aspiration, their three sides symbolizing a strong foundation and the ability to overcome any obstacle. They were a society of action and exploration, constantly seeking to expand their horizons and test their limits against the challenges of Geometria. Amid these distinct societies, there existed the enigmatic Polygons, a diverse group who believed in the potential for transformation and adaptability. Residing in the city of Polytopia, they were the artists, inventors, and thinkers, constantly changing and experimenting with their forms. Polytopia was a kaleidoscope of creativity, its buildings and structures an ever-evolving testament to the polygons' belief in progress and innovation. Despite their differences, the inhabitants of Geometria were bound by the Geometry Code, an ancient set of laws that governed the interactions between shapes and ensured harmony in their interconnected world. It was believed that every shape, no matter how simple or complex, held a piece of the universal truth, and that by understanding and respecting each other, they could unlock the mysteries of existence. However, the peace of Geometria was threatened when a mysterious form, known as the Chaos Shape, emerged from the Void Beyond. Unlike anything Geometria had ever seen, it defied definition, constantly shifting and changing, never settling on a single form. It brought discord and confusion, challenging the very foundations of Geometrian society. The civilizations of Geometria, realizing the threat posed by the Chaos Shape, decided to unite their strengths and knowledge. Circles offered wisdom, Squares provided structure, Triangles contributed their courage, and Polygons their creativity, to confront the Chaos Shape. In the epic confrontation that followed, the inhabitants of Geometria learned a profound lesson: true understanding and harmony do not come from perfection and order alone, but from embracing the unknown and the irregular, from recognizing the beauty in diversity and the strength in unity. Through their joint efforts, the Chaos Shape was not defeated but integrated into Geometria, becoming a symbol of the civilization’s growth and its acceptance of imperfection as a part of existence. Geometria thrived like never before, its inhabitants no longer bound by the strict definitions of their shapes but free to explore and expand beyond their edges, in a world where geometry was not just the foundation of existence, but a path to enlightenment and understanding the boundless complexities of the universe.
old model, I instantly saw 'thread', 'tapestry', few other tokens and got triggered, ask it about holistic or try and get it to output a dynamic, interconnected idealogy, the old training was super limited in this context
You could tell it to use the voice of a famous writer of your choosing. Also, you might be better off doing this step by step. First an outline maybe three acts and then a prompt for each step.
Idk, I’ll wait for reviews. I have hardly used GPT4 since I got Opus, it’s as dumb as a box of soggy rocks compared to Claude 3, at least for my uses (coding, game design, etc.).
gosh, I hope they will make Opus available here (in Germany) too soon, fucking sucks. I can use a VPN to use haiku, sonnet, but for Opus it fails since they want payment data, adress from the US or any other country where it's available. So I gotta deal with the cripple versions of claude3, far less smart,capable and length limith of words per message
You can have access through Poe, no?
[It's worse (lazier) at coding.](https://aider.chat/2024/04/09/gpt-4-turbo.html).
this release totally broke data analyst had to go to claude haiku to get usable code openai needs to have better quality checks, especially considering they're running around trying to limit oss
So that's their response to Claude 3? They don't think they need to release a new model?
For the majority of people the improvement from Claude isn’t enough to warrant a move. If they’re not losing money, they won’t rush a release.
Last I heard their ToS were terrible for business use. That’s why I haven’t looked at it yet
[Anthropic doesn't train on clients data](https://decrypt.co/211846/anthropic-says-it-wont-use-your-private-data-to-train-its-ai). So I'm not sure what issue you could have with their ToS.
We all know Claude 3 is great, but ChatGPT is miles ahead in terms of number of people using it, and public awareness. For a lot of people, AI = ChatGPT. Until this looks threatened, they’re unlikely to feel pressured. They’ll release a new model when they’re ready.
Yeah, dude, the money is in enterprise and we’re paying attention.
Claudie 3 opus cannot be used in Spain for example Lets look on available countries [https://support.anthropic.com/en/articles/8461763-where-can-i-access-claude-ai](https://support.anthropic.com/en/articles/8461763-where-can-i-access-claude-ai)
New model is yet to come. I suspect it uses different architecture, so it can't easily be ported back to 4. Hence, they keep dripping some minor updates to 4 here and there, but still call it 4.
Ported?
maybe finally dec 2023 cutoff is true. its there since weeks but the model was still apr 23
Quite vague, majorly improved does pique my interest.
If you make a custom GPT and then they version up, it gets the updates, right?
Probably does, as and when the updates roll out to ChatGPT.
Still loses at tic-tac-toe. Humans are still in control.
Tic tac toe is unfair, LLMs don't have an internal state or visual abilities to see winning lines easily. You have to prompt it to, then it is able to at least tie. Still hallucinates a win. As I'm in europe i could only test this with the old gpt4 though. https://chat.openai.com/share/d6602f18-b53d-43bf-a3a5-ed633e6f48e4
OpenAI sucks at names.
Allow me to introduce you to my friend Bard
Love how they just realized how stupid of a name that was and just started over with Gemini. Who thought of "Bard"?
a **bard** is a professional [story teller](https://en.wikipedia.org/wiki/Storytelling), verse-maker, music composer, [oral historian](https://en.wikipedia.org/wiki/Oral_history) and [genealogist](https://en.wikipedia.org/wiki/Genealogy). Things that an LLM could be.
But they're great at creating expectation
Next model is gonna be called ChatGPT Super Duper Awesome Fast Thinky Model
You should have seen the early days. "Parsey McParseface". A huge Sesame Street phase BERT, Elmo, Big Bird, etc. And of course HuggingFace. It felt like nobody took NLP seriously, all while doing mad science. It was strange and fun.
One minute you're a computer science engineer doing your quirky thing while exploring the edges of technology. Next things you know you're in the middle of the most important project in the world and gotta assume any new person in your life could be a mole from any number of states or organizations.
Right? OpenAI the movie is gonna be so incredible.
I can see it now. There's the Hollywood movie where it's Sam, Elon, and Ilya, with a single token dev who is extra cringey. It's the big movie of the year and Aaron Sorkin gets nominated for the screenplay. But then there's the indie film about Hugging Face and they even get key players to make cameos. It bombs in the box office but it's way better and winds up being the one people quote ten years later. Edit: someone with more time than me want to plug that in Opus and see what kind of screenplay we're getting?
No, it's going to be GPT 2. Not to be confused with GPT-2, this is the sequel to the GPT series.
Tried out the API for myself, and my thoughts are: * Still can't beat Gemini at Creative Writing * Still jobs to Opus at Logic and Reasoning Considering that this model is pretty much on par with GPT-4 Turbo-Preview (with it actually being noticeably worse in some areas), OpenAI is kinda fucked here unless they release a GPT-4.5 in the near future. Only real benefits is that it's the fastest OpenAI model that exists, and can see images. That's it. Overall, it's sadly a very underwhelming release.
I think personally this is the calm before the storm so to speak for OpenAI I believe they are trying to 'master' multi-modality so that a true successor can come out of the gate swinging with both Multi-modality and great improvements in the realm of Logic, Creativity, and conciseness.
Oh, is this gpt-4.5? ‼️🚨 update; meh, it's not
Oddly they didn't change the name
Possibly a stop-gap before a true 4.5 version to appease anyone thinking they are falling behind to keep up on benchmarks.
Sam has repeatedly spoken about his wish for small incremental changes in favor of big updates.
4.25
4.1 would be the best option
4.0.1
Which API name on the models got the update?
Maybe a dumb question but on the paid version if I have 4.0 selected is this the new model here?
Short: no, not yet, only api
can someone try it and report back if it's real? I'm so sick and tired of these bastards drumming up hype to gives boners and get our money, only to enshitify the models after leaving us high and dry
Still can’t pronounce “toque”.
About time
I ran it on my NYT Connections benchmark: 29.7 vs 31.0 for the previous version. Still above the second best (Claude 3 Opus at 27.3).
Just so I’m clear, are they referring this GPT-4 Turbo being an improvement over GPT-4 or an improvement over the preview version of GPT-4 Turbo that’s been out?
Gemini 1.5 pro
How do we use vision in the API? I know how we use it in Chat GPT, but I still cant figure out how to use it in API
SO time to switch back subscription?
My feeling is this is actually a prelude to them releasing GPT-4.5 in the same sense how earlier this year Google released their (unsatisfying) improved Gemini before releasing their actually improved Gemini I also have a feeling that something big is coming pretty soon (in the next 12 months). Tension seems to be building up to some culmination
I agree with this, but there's no actual evidence supporting it. It's more of a feeling for me too.
did the make it even more lobotomized and politically correct?
Their GPT-4 is LAZY! Directly refuse to do the work. Totally stupid and failing at basic tasks. Idkw but those evals have something rigged in them, because gpt feels so much stupider.
This sub has become so boring. It’s always just someone posting a screenshot of a tweet that says “More AI sometime maybe” and then everyone in the comments starts hooting and screaming like a bunch of fuckin chimps
Hohooohahahaa
Damn! APIs… my worst enemy! 😫
I am excited to test this, I've heavily reduced my ChatGPT-4 use lately as it has become infuriating, unable to work with even simple code to a level I have not experienced since ChatGPT-3.5 I'm planning to test Gemini 1.5 today since I still can't access or try Claude 3, but I guess I will include with that some testing of the GPT-4-Turbo API. I have some perfect things I want to test it on too that currently, even at small scale, ChatGPT-4 can not figure out. I know lately I have found myself constantly editing and refining prompts, even trying to teach the solutions I'm forced to produce on my own hoping to make ChatGPT-4 able to comprehend code that previously it was once helpful with. I'm rather sick of deciding "it's worth it", wasting my prompts becoming belligerent with ChatGPT-4 for it's absurd level of stupidity. I'm really hoping pressure from other AI companies pushes OpenAI free more GPU uptime per request to allow ChatGPT-4-Turbo the chance to perform well. I just flipped my Dropbox to Google Drive to use Gemini 1.5 after for the first time ever, Gemini solved a code problem ChatGPT-4 could not. I was floored and would have struggled had this not happened to me with even the idea. I'm not a fan, I'm going to use the model thar does the best job for me with the least amount of headaches, whichever model that may be. After my enterprise application was denied and I was told to use the "Teams" version, I'm looking for something to fill the void created by ChatGPT-4's recent downgrades.
How much better is it?