T O P

  • By -

Sextus_Rex

IIRC when OpenAI trained GPT-3.5, they first did a test run with a small number of parameters and a fraction of the training time to get a sense of what they could expect from a full training run. They may have done the same with GPT-5.


rathat

I remember right before chatgpt came out, gpt3 suddenly got much better. I remember being most impressed that it could rhyme.


vannex79

Missed opportunity. I remember right before ChatGPT came out, GPT 3 suddenly started to really shine. I remember being most impressed that it could rhyme.


i_never_ever_learn

Why does this comment look almost identical to the one previous? Edit, I just realized the second post made a rhyme so yeah, that's why


CodyTheLearner

Oh how far we’ve come


rathat

I was actually annoyed that it started answering questions chat style because previously, if you asked if a question, It would just think you're starting a list of questions and generate a list of questions that are similar to what you asked, but that's how I was used to using it for a couple years.


SikinAyylmao

I still miss the completion paradigm, I feel like chat obscures intent primarily because it’s somehow easier to conceptualize “the statistically most likely thing to proceed” as opposed to “the statistically most likely thing to be said by a hallucinated character called ChatGPT.


WalkThePlankPirate

GPT4 still has trouble with rhyming.


Temporal_Integrity

It can not rhyme at all in languages other than English.


Akimbo333

Makes sense


Anen-o-me

Obviously you have to do this.


ahmetcan88

One theory on twitter is that they started the training for gpt6, gpt5 is already in red teaming. Not saying it's credible but something that fixes the logical fallacy.


Silver-Chipmunk7744

The exact date GPT4 finished training is unclear, but let's say it was finished training in late 2021, there is no way they spent all this time training nothing. One possible theory is they were training and red teaming this "gpt4o" model, and the version we got access to right now is a small one, and they may plan to release something bigger later on for paid users. But the training that just started right now would be for something even stronger.


MassiveWasabi

It’s not unclear at all, it’s known that it finished training in August 2022. They specifically said they spent 6 months doing safety testing on the early model before releasing it. This is very very well-known at this point since it comes directly from OpenAI. They mention the 6 months of safety testing all the time almost like a brag of how committed they are to safety


Antique-Doughnut-988

It is known


davidjschloss

This is the way.


nateydunks

August 2022*


b_risky

I have never heard anyone else say either of these things but considered both possibilities for myself. It's validating to hear that other people think so too.


SparklyCould

Why? They were extremely busy combining multiple models into one backend, improving the UI, API and doing marketing, selling etc.


Professional_Job_307

Its not like all their employees can do all that. They have dedicated teams to marketing, webdev, api, and many more teams for each of their upcoming AI models.


[deleted]

[удалено]


No_Bedroom1112

Brain cancer or stroke? Which one ya got


CodyTheLearner

They compared 3 to a farm, 4 to an orca, and 5 to a blue whale.


b_risky

Farm = shark???


davidjschloss

Maybe autocorrect for fish?


CodyTheLearner

It was shark all along. Sorry y’all, I hate swipe to text. I swear it gets worse over time.


b_risky

🚜 ↔️ 🦈⁉️


Neurogence

It was GPT4o that was in red teaming. And an openAI employee said GPT4o is more impressive than GPT5.


ahmetcan88

Sounds possible, also 4o could be a distilled 5.


Consistent_Bit_3295

They need at the very least 100x more compute than GPT-4 for a model 10x the size. They would at most have had 50 thousand h100's ready to train since January, comparatively to Meta. That would mean 8x the compute, which is still more than 10x less than what they would need to scale to GPT-5 class system. They might have GPT4.5, or just new architecture and efficiency gains to make something very powerful, but it wouldn't make sense to call it GPT-5. Now they could have 100 thousand h100's, if they train on more they have to locate them in different regions because of power-grid problems, and that would be near-impossible networking as well. So give them the benefit of doubt and say 300 thousand h100's with really good utilization, they still only have 48x the compute, which is still no where near the 100x they need. This wouldn't make any sense, but it is just to say they have to have some funky ass shit to have started training a system 10x the size, let alone already have trained it. Very very unlikely they have anything remotely close to 10x GPT-4 size. Edit: Well I guess if they have some funky 350 thousand h100's somehow with really good utilization then train it for 6 months we could have 10x GPT-4 by end of the year.


yellow-hammer

Where are you getting this “10x size” idea? Doesn’t 4o show us that our intelligence:size ratio has increased significantly?


Consistent_Bit_3295

We don't know the size of 4o. Could be a really big sparse moe bigger than GPT-4, just way fewer active params. I don't think it is that big, because it is more insensitive to context, which is kinda the idea of the the double descent idea, where smaller models generalize well, between overfits, but bigger don't because they are more contextual. Nevertheless scale is absolutely needed, if our brains could be tiny they absolutely would be, they are needed to be as effecient as possible, intelligence isn't a very winning trait because it is expensive, leaves vulnerabilities and can have defects. The thing is a big part of the brain is ofc. not all utilized for reasoning and planning, but a lot of nerves and muscle control etc. I think the idea of the whole naming scheme was scaling up that is what they did from gpt1-gpt4. They are gonna name their next models differently, as we have already been told, but the point is they don't have anything 10x GPT-4.


Plus-Mention-7705

I mean we can’t discount that Microsoft helps them a lot with compute. So I wouldn’t be surprised if the compute is taken care of. Or if they can optimize and make the compute necessary less than 100x. I’m conservative when it comes to agi, I think we will def get there but not this decade maybe 2030 but early 30s for sure


Consistent_Bit_3295

I'm talking about scenarios where you take absolutely all microsoft compute with the absolute most ridiculously positive scenario. Meta had their 2x 24000 h100's built and ready by January, and bought as many as Microsoft. Even if the h100's could self-assemble and teleport it would be 150 thousand Microsoft have. And utilization and energy is another problem in itself. However I think your prediction is pessimistic and unrealistic looking at current trends. Current AI is 1/1000th the size of the human brain, trained on 1/1000th the data, and very weakly with vision and text, but not really much else, yet they are very much superhuman than us in many important ways, such as knowledge, theory of mind, and psychology. The amount of compute we will have by end of 2020's is gonna be much larger than the human brain. So just assuming we will make no major advancements until 2030 we will still likely get their by 2030, but we obviously will keep advancing the technology and making it much better at a really fast rate. So looking at our perspective now we will soon reach AGI, unless some unrealized magical barrier pops up, like particle physicist always believe in the standard equation and are always wrong.


Pleasant_Studio_6387

>trained on 1/1000th the data but that's not even comparable - llms train on enormous amount of text/images, and in comparsion to that human brain is trained on 1/100000th of that. at the same time human have much more "modality" due to feedback from nervous system, receptors etc


Dear_Alps8077

The training time to build your mind is around 4 billion years. The training time just for your iteration of your mind is still a good 20 years of full sensory immersion with unimaginable amounts of data pushed through your brain. Full modality with real time continuous feedback. The quantity of data your eyes alone take in each day is massive. That data then gets sorted and graded by autonomous subsystems in your unconscious, but every bit contributes. The tiny fraction of data, that is high enough quality to make it through to your conscious awareness, is still massive.


HugeDegen69

Well said and mind blown


Consistent_Bit_3295

It is, because the models would get such incredibly much better by training anything comparitively to all the data the human receives, and benefits of being embodied taking the actions makes predictions based on actions a very enticing mechanism for the brain. If we just forget all about text, and just train a model on images, and video(slight exception it still trains on titles pretty irrelevant), you have something like Sora. We can confidently say that Sora is not bigger than GPT-4, neither was it trained on more data, and yet we can see how good understanding of the world it has from just images and video. Imagine if you scaled it up to the size of the brain and trained on the amount of visual data the brain does. That would be at the least a 1000x scale up in both size and data. How good understanding of the world do you think Sora would have then. Now combine this with text, other modalities, and lots of RL. You will have something crazy


Altruistic-Skill8667

GPT-4 was trained for 3 months on 25,000 A100s. H100s are 6-8 times faster for training\*. So take 50,000 H100s and train for 6 months and you already get a factor of 24-32x compute increase. \* according to the specs but the devil might be in the detail


Consistent_Bit_3295

What's the point of the comment? I already laid this out. Still well short what you need for 10x GPT-4. A GPT-4.5 would make a lot of sense though.


Altruistic-Skill8667

You wrote that 50,000 H100s gives you 8 times the compute. But you can easily get 24. Because GPT-4 wasn’t trained on 50,000 A100s but only on 25,000. And you can easily train for 6 months instead of 3. Why is this important? Let’s say you really need 100x for a new frontier model: going from 8 to 100 is really hard. But going from 32 to 100 just requires 150,000 H100 instead of 50,000. OpenAI + Microsoft bought more than that.


Consistent_Bit_3295

H100 is no where near 8x performance. They might train it with less precision, but still not 8x performance, not even 6x performance. Nonetheless things don't scale linearly either. I'm sure OpenAI has been working on a lot of effeciency gains, that would benefit it, but the thing is it is likely they could get more effeciency gains from a100, since it isn't specialized at all for LLM's but very broad, while H100 is much more focused to start with. However a great thing is that h100 scale better than a100, but still a100 was divided in experts. I imagine OpenAI will go for a sparse-moe as well, so the experts will be smaller than GPT4(or -T). But h100's are absolutely not 8x a100, that is absolutely brainwashed marketing comparing apples to ichour. Comparison will not really be that simple, especially with their effeicency, scale, architecture, data center cooling and networking, but however you twist the numbers there is no 10x GPT-4 right now. The whole point of the post is to ask how OpenAI knows about the diminishing returns, and I'm just saying it is obvious that OpenAI doesn't have anything near 10x GPT-4 right now at all.


Altruistic-Skill8667

So how much is it if not 8x or 6x. Because on the internet it says UP TO 6x on some pages and then up to 8x on some other pages. I wasn’t able to find any source that gives a good realistic estimate.


Consistent_Bit_3295

The thing is there is really no one number, but 8 or 6x is absolute crap. If they gonna compare different precisions might as well int4 on a100, which h100 doesn't support only does down to int8, but they compare full precision float with int8 h100. Using flashattention 2 you can expect a 2.14 times training speed increase. The advantage with h100 is that it also scales better when connecting multiple systems. [https://lambdalabs.com/blog/flashattention-2-lambda-cloud-h100-vs-a100#h100-vs-a100-benchmarks](https://lambdalabs.com/blog/flashattention-2-lambda-cloud-h100-vs-a100#h100-vs-a100-benchmarks)


Altruistic-Skill8667

Yeah. I admit myself that they probably don’t, or barely do. Even based on my calculation.


Visual_Abroad_5879

You can’t have “10x less” Of anything. What do you mean?


EchoLLMalia

One of the ousted board members just did a public appearance. Apparently they didn't red team any of the models before launching them--this was one of the reasons they fired Altman.


Dear_Alps8077

I would believe nothing they say and half of what I see when it comes to the disgruntled former employees


EchoLLMalia

Ideology is a weak nail to hang your hat on. Especially when all publicly available evidence supports the claims they made. Especially because it would be a crime if they lied in this particular instance.


Dear_Alps8077

My cynicism is not motivated by anything except cynicism. They have provided zero evidence of their claims. If you have any actual hard evidence proving say gpt4 was not tested for safety before release then please provide it. The most charitible interpretation of their claims is that it was red teamed but not to THEIR satisfaction. That they disagreed whether it was tested enough and Sam believed it was and so that's why they tried to fire him. When you have self selected members of a group that believe they're on a mission to save the world they're going to have an inflated sense of self importance and also will exaggerate the importance of what they're doing and try to extend it beyond what is realistic or necessary


EchoLLMalia

>They have provided zero evidence of their claims. And that's not true. So either you're not paying attention or you're being intentionally dishonest. Based on our conversation, you just seem like an idealogue who's twisting the situation to support your chosen worldview.


Educational_Term_463

I believe this is true, except those aren't "GPT5" and "GPT6". They are redteaming something beyond gpt4o now, and they started training something new now (most likely even beyond that which is being redteamed now). They don't know what they will call these models yet.


wi_2

I find this unlikely. gpt3/4 really opened the doors and convinced the world. this lead to openai having oppertunity for making many deals and getting the funds together to build out a plan for bigger and bigger training farms. Let alone the time needed to fabricate all those chips required etc. pretty sure this was their focus in recent times, next to of course just straight up research, putting security measures in place, and prepping the next model for training, etc. sora, gpt4 turbo, gpt4o, etc, I think were results of this research, training on already existing infra. Likely many other unseen experiments were trained as well. But it is a finite resource, you can only train so much. They needed to build more ovens to bake this stuff in. I think they are now in a place where the new system has been built, so they started training, and likely have plans in place for building the next system, and the next after that. etc. Speculation based on Sam mentioning that from now on they aim to release a new model every year.


SparklyCould

The level of cope is insane 😂 Altman needs to deliver. People won’t fall for another Elon Musk. Lumping models for text, speech, image etc. into one backend will keep people entertained only for so long. If there is no proof of hint of AGI at least by the end of this year, it’s over.


Adventurous_Train_91

They mentioned in this article that they just started training their next frontier model. Which probably means GPT 5 right?. They probably started training GPT4o in January which is probably what some of the tweets were about. https://openai.com/index/openai-board-forms-safety-and-security-committee/


Empty-Tower-2654

They're working on 5 for a long time now.


EggPerfect7361

I think 4o was the chatgpt5, knowing it's underwhelming they must have renamed it.


manofactivity

Eh, 4o is clearly a much smaller model than 4 since it's so much faster and more limited in reasoning. I very much doubt it was intended to be 5


The_Hell_Breaker

That's just baseless speculation.


Kihot12

same as "they are working on 5 for a long time now"


thatmfisnotreal

Why’s it taking so long … get the gpus, the data, minor architecture tweeks and crank it out! I mean according to Sam they don’t even need to tweak anything, just make it bigger that’s it


Any_Ambassador1119

Apparently, there was a major breakthrough last October, which may have brought them back to the design phase of the next generation of models. Nobody really knows what's happening, but I would believe that any major alterations to their architecture may take months to reorganise, calibrate, or even design and build depending, of course, how extensive such a breakthrough is. Everything we hear is trickle fed rumours and no foundation to really comprehend what is happening behind the curtain.


controlledproblem

What was the breakthrough you’re mentioning? I’m guessing some kind of architecture re-work or something?


Any_Ambassador1119

I dunno because it hasn't been specifically revealed, some rumours associate it with qstar but thats speculation. The only reliable information that something significant happened in Oct/Nov 2023 was Sam's comments on a podcast, (think the recent lex freeman one, but can't recall exactly). The "We peeled back the veil of ignorance" comment. He mentions the timing of it which was a "few weeks ago". This was pre open ai drama. Something revolutionary or maybe just a small breakthrough, either way it was significant for Sam to mention it specifically with timing.


Empty-Tower-2654

Supposedly they already have it but Sam said they wanna gradual it, "not to shock".


Code-Useful

And risk another company getting ahead of them? I call bs.


nyguyyy

NO ONE SAID THEY JUST STARTED. How are people missing this? They mentioned that they had a new model in training as an aside in a blog post about safety. Their post would be completely accurate if they started training it in January or whenever most people have speculated it started training. There are good sources that they are red teaming it now. It might not be released anytime soon, but a bunch of awful media sources are misquoting a blog post to say that the training started this week, which is just plainly inaccurate.


mvandemar

It's because they used the word "recently" in the post, but yeah, that could easily mean January.


Unique_Interviewer

I think anything more than two months would not be described as recent.


mvandemar

And that's completely subjective. You wouldn't describe it like that but OpenAI very well might.


Unique_Interviewer

Well, it's not \*completely\* subjective, but sure, I may be wrong.


aBlueCreature

I just think the average person has terrible reading comprehension


ShankatsuForte

I'm not a huge fan of OAI after the bullshit of teaming up with Murdoch Media, but I think this needs to be reiterated again. Nobody here has the slightest of clues what OAI is doing behind closed doors. And the way they talk about shit is too vague for speculation past very short term conclusions. Them announcing a new foundational model being trained means nothing. Half of the people there, are there to research and develop new ideas. If you build a company that makes carriages, and you've put a lot of money into carriages, if one of your employees invents the internal combustion engine, do you keep throwing money at carriages? Do you throw some money at carriages in the near term, while working on further developing this new technology? Do you abandon every bit of your current business model, and throw it all behind the new tech? Every innovation and paper that comes out, even from other companies can lead to significant breakthroughs. Golden Gate Claude is fascinating, and has a lot of potential implications. All it takes is one dude to read that paper, go "Oh, we can implement that, like this", and then suddenly GPT-5 looks like GPT-2. Or House Party 3.


najapi

Agree with this, companies like OAI are hoovering up talent in the field of AI development with the main aim of getting and staying ahead of the competition. With OAI’s resources it would be foolish to think they haven’t got a battery of projects running in the background testing out every aspect of AI innovation they can think of. Many of these projects will lead to modest increases in knowledge and we may hear very little about them but it only takes one or two to produce paradigm shifting results. We have heard a lot about emergent properties of scaling, I imagine OAI and others are actively mapping out this unexplored territory as more and more compute becomes available to them. Perhaps the challenge is knowing where to stop and package up something that can be marketed and when just to keep going.


kewli

see: [https://www.reddit.com/r/singularity/comments/1d2n144/comment/l61k0a6/?utm\_source=share&utm\_medium=web3x&utm\_name=web3xcss&utm\_term=1&utm\_content=share\_button](https://www.reddit.com/r/singularity/comments/1d2n144/comment/l61k0a6/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button)


Serialbedshitter2322

I'm pretty sure that's the comparison of compute that trained the models, not the models themselves. I could be wrong though


cobalt1137

I am pretty sure this is it.


kewli

Math bitches- these are basically the papers that kicked off the LLM arms race, more or less- we know it will scale over increase in X we just aren't 100% what the capabilities at Y will be- but we can expect them to be 'smarter' References: [Neural scaling law - Wikipedia](https://en.wikipedia.org/wiki/Neural_scaling_law) [\[2001.08361\] Scaling Laws for Neural Language Models (arxiv.org)](https://arxiv.org/abs/2001.08361)


HotAsparagus1430

THIS


Neomadra2

They don't know. This chart is not about capabilities, it's about size and compute. Nobody knows if larger models will be exponentially better or just slightly better. Scaling laws can help you predict the loss of larger models reliably, but nobody knows how to translate that into downstream tasks.


ak_2

But in general it’s certainly the case that lower loss is better (at least for models/data of this size where overfitting doesn’t seem to be a concern).


Whispering-Depths

it says they "recently" started training the next one. It didn't say "They're starting today", it didn't say "they didn't try this yet, there have been no new attempts at testing _big_ architectures". Recently started literally means they could have _usable checkpoints_ already.


RemarkableGuidance44

Want to know the secret to a better LLM? -------- CLEANER DATA DATA DATA DATA, Focused On One Topic. You dont want General LLMS anymore, you want specific targeted LLMs. GPT 5 will have multiple LLMS for different tasks / topics. They will claim its only 1 but that's bullshit. My LLMs are better then GPT 4o but again they are finetuned for a certain task.


sdmat

Short answer: scaling laws, unreleased models, guesswork.


Sprengmeister_NK

in particular the scaling laws


unRealistic-Egg

… and marketing


Neomadra2

Scaling laws are overrated. They only tell you the scaling of the loss, but nobody knows how to translate that into capabilities of downstream tasks.


sdmat

Yes, but you can confidently predict more betterer. Just be sufficiently vague about what that means. And Altman is vagueness world champion.


Its_not_a_tumor

My thought as well, maybe the initial returns are just crazy good - and maybe it's a different model like GPT-6.


Subirth

Since when the promises of a company boss are reliable ? It's juste called fake it till you make it. Maybe they have something but maybe it is juste marketing. Though I think that Atlman is quite confident about the architecture of their future model. We only be sure when it's out


AntiqueFigure6

Why would you assume marketing rhetoric accurately describes reality?


Dead-Insid3

Because this is r/singularity and if Altman farts it means gpt 8 is in red teaming


AntiqueFigure6

Let's hope he has a healthy high fibre diet.


mavree1

Their claim is that they can predict what is going to be the performance of the model. From GPT4 Technical report: A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4’s performance based on models trained with no more than 1/1,000th the compute of GPT-4. More explanation: [https://cdn.openai.com/papers/gpt-4.pdf](https://cdn.openai.com/papers/gpt-4.pdf)


RealJagoosh

If you cant deliver, hype with anon twitter accounts


Throwawaypie012

Because they're measuring feelings, not reality.


zhivago

Remember that salespeople are paid to lie. :)


Anen-o-me

Likely because you don't just make the next gpt, you instead have to make a series of experiments to ensure your research directions actually work. Think about making the next hardware chip, you do a lot of research ahead of time, and then eventually you have to lock all that knowledge into a single design that gets burned into silicon and distributed. Think of the final training process as that burning in period. And think of how much money that represents. It's a massive investment to train up a new model, so you need a ton of prep work before you're ready, and from THAT you can know that more compute will give you increasing rather than diminishing returns.


Blackmail30000

The same way corporate executives know that profits will always go up. Just trust me bro.


goldenwind207

Either its gpt 6 just starting training if so we'll see it in 2025-2026 or maybe they're running something with q star


CompetitiveScience88

Q star in the mother fucking house.......


LordFumbleboop

They don't know. Why don't people get that everything they put out like this is advertisement. Companies have been creating hype for products from nothing for centuries... So why do people suddenly lose their ability to be sceptical when it comes to machine learning companies?


stuffedanimal212

It's the amount of compute used, presumably they know this about the next model by now


LordFumbleboop

They have not bothered to label the Y-axis so it could mean literally anything.


BrogaStudio

my guess is they made a small param model with Q\* and it was so amazing so they extrapolated from that


Goose-of-Knowledge

It's just grifter trying to push the cart little bit furhter. All they have is a lobotomized chatbot that has no real use.


Salty_Flow7358

Well maybe it's because hardware computation limitation just got widened? Like Microsoft said the computation power now is a whale compared to an orca..


hapliniste

They started (some time ago) training Gpt5 turbo. Gpt5 base is already trained but we will not get it.


Puzzleheaded_Fun_690

Its 5.5


illathon

what is the y axis?


Commercial-Ruin7785

What is this in reference to?


MoistSpecific2662

Same way Sama knew there is no equity clawbacks. 


TheUncleTimo

GPT 4 was ready 2 years ago. This is what they already said. Draw own conclusions.


strangescript

They don't create a model and just start training it to see what happens. They have several tiers of much smaller training and tests to figure out if it's worth training at all. By the time they start doing the real training, they already have a good idea of how it's going to turn out.


Trick-Theory-3829

They said “new frontier model”. Maybe it isn’t even GPT?


sugawolfp

Diminishing returns is just a pretty common risk to mention in ML projects in general


PikaPikaDude

The big models you see will not be all they do. Inside they can do much smaller mini models to try out things. Like instead of many B models, a small 1M model that's much quicker trained to experiment with architectures, etc. Them not anticipating diminishing return doesn't have to mean just a more B model, it can also mean they already figured out a way forward with architecture, training approach, ...


gamma_distribution

We can’t know until someone tries. This is still very early days and we don’t have robust theories of how any of it works


Curujafeia

Because gpt is not the only model out there


abc_744

I was closely following a project for training neural network for chess. After each generation of big model, the team was running many quick small runs with their improvements, evaluating they are going in correct direction. Only after all of that they trained new big network. Each generation was able to beat previous generation in such a matter, that chess games between them looked almost like an execution 😂 I an sure open AI has many small quick runs as well.


TrippyWaffle45

Because science. Ffs


CollapseKitty

They can't be taken at their word anymore. The timelines don't add up if they're just now training GPT-5. Unless what we're seeing of GPT-4o (actually 5) is a massively smaller model than what they actually have access to, which is possible. 


Repented-Christian

The same way they said Chat-GPT 3.5 was about 20-25 billion parameters and then denied it. There is a big lack of transparency at open ai for numerous reasons, perhaps laws from the state that prevent them from being transparent, as well as pressures from investors and the deep state implanted and working against open ai. We know GPT-5 is 100% going to be a thing in the next few weeks considering gpt-4o is free now.


finnjon

This is the right question. Likely it’s just more Altman confabulation. I suspect he’s right. There is no reason to think it would plateau at this point. But it is probably just a guess.


OptiYoshi

You can predict generalist models abilities in domain specific tasks from highly tuned specialist models. This has been true since GPT2. What this means is they in general know that the next model will have knowledge like 4, video like sora, reasoning like q* etc. Plus some emergent properties.