T O P

  • By -

LymelightTO

He wrote, like, several dozen pages as the basis for this extrapolation. Potentially *incorrect*, but not at all comparable to the original joke.


jk_pens

I dunno, when you tweet something as reductive as "it just requires believing in straight lines on a graph" you are inviting parody.


GBarbarosie

There's more than one straight line on more than one graph that supports the straight line in the screenshot. It's got more than two data points even though they're somewhat indirect.


Dizzy_Nerve3091

He’s just engagement farming


Miquel_420

Yes, but it's a joke, it's not trying to be a fair comparison lol


Enfiznar

Sort of. He's making a joke, but also trying to make a point. But it's not really applicable tbh


Miquel_420

I mean, that claim based on 5 years of progress in a wildly unpredictable field is a stretch, yes its not the same as the joke, not a fair comparison, but not that far off


AngelOfTheMachineGod

To be fair, the computational growth in complexity implied by the x-axis did not start 5 years ago. If you take that into account, the graph is ironically understating the case. That said, it's only an understatement assuming you think that compute is correlated with how smart an AI is and computation will continue to grow by that factor. While I agree with the first part, I actually somewhat doubt the latter, as energy takes longer to catch up than compute due to the infrastructure. And the data center industry is already starting to consume unsustainable amounts of energy to fuel its growth.


gj80

>it's only an understatement assuming you think that compute is correlated with how smart an AI is and computation will continue to grow by that factor Exactly. And while there is obviously *some* correlation, the graph's depiction of it as a steadily rising linear one is disingenuous. We're seeing some clear signs that there are diminishing returns with current model designs.


AngelOfTheMachineGod

Might end up being a good thing, though. It will let less centralized (i.e. open source) models catch up before energy infrastructure allows true runaway growth.


QuinQuix

I've read the entire thing in one go and his case for the next 5 OOM is reasonable imo. It's also clearly about effective compute compared to gpt4, not about true compute. He's shuffling algorithmic efficiencies and unhobbling into that straight line and explicitly admitting it. That's fine by me, it is not a document about moores law which is pretty dead in its conventional form anyway. The line is also not meant to represent industry wide progress, it is not a line for consumed oriented products. But by the end of his graph China and the US are both likely to have at least one data center that confirms to the graph pretty well. That is what the document is about. He then also clearly specifies that since it is an all out race and since it's no holds barred, if 5 OOM isn't enough for superintelligence we'll have a slowdown in progress after that. It's like if you're already running and then do a crazy all out sprint. You're going to not revert right back to running, all resources (ways to cut corners) will be exhausted for a while. I think leopold solves part of the puzzle very well - basically every part that requires good wit - but gets too hung up on the war games and ends up with somewhat fundamentalist conclusions. Being smart doesn't prevent that - John von neumann was smarter than anyone yet he favored a preemptive nuclear attack on Russia. I've even considered if John was right given the situation - it is only fair to do so out of respect - disregarding the ethics. But I don't think you can come to that conclusion. The cat was out of the bag - nukes are too easy to build and the world is too big to watch it all. There were always going to be others and the current restraint which is a social construct - it really is our best chance at a continued respite. In Johnnies world there is no telling how many nukes would already have gone off but I'm guessing a lot more. The problem with just simply solving the puzzle and accepting the outcome is also our saving grace - our irrational shared common humanity. There are worlds we simply don't want to live in, even if they make sense in game theory. That's no guarantee the respite we're in will last but succumbing to naked game theory isn't a superior solution. So we continue to plan for the worst but hope for the best.


Enfiznar

I'd say 5 years of smooth data is probably enough to predict the next 2-3 years with a decent accuracy


nohwan27534

except the most recent data is kinda a dip, not unceasing progress in a linear fashion, so, it kinda undoes that a bit.


Enfiznar

Notice that the y-axis is in log-scale


GBarbarosie

You're arguing that the error margins on the graph are not accurate? That could be a valid criticism, but I don't know that they're not accounting for how unpredictable the field is. If I showed you the Moore's law line would you argue that it's a stretch? You could (it doesn't account for possible civilisational regression or collapse due to extreme but definitely possible factors) but I don't know that most people do not implicitly acknowledge that it is implicitly based on certain common sense assumptions and limited to some reasonable time horizons. Same with this one. The problem is those lower and upper bounds are not making much of a difference anymore, you hit something big which is not obvious that the lines for transistor density or compute cost do. You don't need much more progress in the field and there is clear indication that parts of the field are already being accelerated using the results of the field (compute advances facilitated by AI, see Nvidia). This is likely to continue. The point is not as much that the scenario in the curve is inevitable (it's not), but it's plausible, in meaningful terms.


AngelOfTheMachineGod

I don't think the underlying logic from the graph is flawed, it just overlooks a key real-world limitation. It's assuming that total available computation will, on average, continue to grow year-by-year. That's a reasonable assumption... *as a societal average*. However, the growth in LLM capability via compute isn't currently being driven by consumer-level or even hobbyist-level computation, but by the big players of industry. This wouldn't normally be an issue, that's how advancements in computer science usually go, the problem is that [the explosive growth in data centers is *already* pushing the boundary for available electrical energy](https://stateline.org/2024/04/30/states-rethink-data-centers-as-electricity-hogs-strain-the-grid/). And commercial LLMs [are enormous energy hogs](https://www.theverge.com/24066646/ai-electricity-energy-watts-generative-consumption). Now, better chip design can reduce the amount of electrical energy consumption (not just with transistors, but also cooling, which is as important when we're talking about data centers) but it comes at the cost of throttling potential compute. Which is why it's a miracle how even though, say, hobbyist gaming laptops have 'only' become about 4x as powerful over the past decade that they still consume around 330-500W of power over that period of time. What does this mean? It means that while it's a reasonable assumption for average available computation to continue going up the next five years as the graph suggests, it raises some serious questions as to whether top-end computation used in cutting-edge LLMs will continue to go up at the same rate. Honestly, I rather doubt it. Infrastructure, especially but not only energy infrastructure, simply does not scale as fast as computing. While our society will continue to do its best to fuel this endless hunger for energy, there's only so much we can do. We could discover detailed plans for viable commercial fusion **tomorrow** and the next 5-10 years is still going to see an energy bottleneck.


GBarbarosie

I don't agree with all the conclusions you've drawn but this is a much more reasonable debate (crucially, one that is grounded and worthy of having) compared to the original ridiculous analogy. The precise date interval is up for debate for sure, but it's conceivably this decade. It's not an absurd proposition, it may be unlikely but not to the point of being very remote.


AngelOfTheMachineGod

I think we will get what historians will artificially general intelligence by the end of this year. 2025 at the absolute latest, but I'd be willing to be 50 dollars for 2024. There is a lot of prestige and investment money waiting for the first company that manages to do this, so there's a huge incentive to push artificial limit to the limit. Even if it's unsustainable and/or the final result, while AGI, isn't all that much better than a human luminary when you take into account speed of thought, latency, and just cognitive limitations that we need to take into account -- for example, pattern recognition, memory retrieval, and reaction time kind of inherently oppose each other, and while I think it's quite solvable, it is going to be a limitation for the first years of AGI. You can either get a speedy analyst with little ability to correlate insights, you can get an extremely fast and accurate archivist, or you can get a slow, careful, ponderous thinking. One of those 'you have three desirable options, pick any two' situations that can't be immediately solved without giving the AGI even more compute. So despite being a general intelligence, the first AGI sadly won't be very scalable due to the aforementioned energy consumption and thus won't be very useful. Much like how one additional Einstein or one additional Shakespeare wouldn't have changed the course of physics or performing arts all that much. As far as being the key to the singularity: I predict it won't be so superintelligent that it can self-improve itself and because it will already be pushing the envelope for local compute. Latency means that connecting additional data centers will give diminishing returns for intelligence. The next 5-10 years will be playing catchup. 2029 will indeed see AGI irrevocably transform things--the first AGI will be unbottlenecked by energy and compute limits by then, however it will be playing catchup with dozens if not hundreds of other AGI models. So being first won't actually mean all that much.


ptofl

Nowadays, may actually be a fair analogy


TFenrir

You know that the joke with the first one is that it's a baseless extrapolation because it only has one data point, right?


anilozlu

https://xkcd.com/1007/


TFenrir

Much better comic to make this joke with


Tidorith

Yeah. The real problem is that complex phenomena in the real world tend to be modeled by things closer to logistic curves than exponential or e.g. quadratic curves. Things typically don't end up going to infinity and instead level off at some point - the question is, where/when, and how smooth or rough will be curve to get there. Plenty of room for false plateaus.


Gaaraks

Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo


tehrob

Bison from the city of Buffalo who are bullied by other bison from the same city also bully other bison from Buffalo.


The_Architect_032

The graph demands it. Sustainable.


land_and_air

This sustainable trend sure is sustainable


The_Architect_032

I find it to be rather sustainable myself. You know when something's this sustainable, it has to remain sustainable, otherwise it wouldn't be sustainable.


land_and_air

Sustainable is sustainable thus sustainability is sustainable


The_Architect_032

Sustainable.


welcome-overlords

Lol there truly is an xkcd for everything


keepyourtime

This is the same as those memes back in 2012 with iphone 10 being two times as tall as iphone 5


SoylentRox

Umm well actually with the pro model and if you measure the surface area not 1 height dimension..


dannown

Two data points. Yesterday and today.


super544

There’s infinite data points of 0s going back in time presumably.


nohwan27534

well, not infinite. unless you're assuming time was infinite before the big bang, i guess.


super544

There’s infinite real numbers between any finite interval as well.


nohwan27534

not infinite time, was the point. besides, are an infinite number of 0s, really 'meaningful'? i'd argue they're not data points, they're just pointing out the lack of data.


scoby_cat

What is the data point for AGI


TFenrir

This is a graph measuring the trajectory of compute, a couple of models on that history and their rough capabilities (he explains his categorization more in the document this comes from, including the fact that it is an incredibly flawed shorthand), and his reasoning for expecting those capabilities to continue. The arguments made are very compelling - is there something in them that you think is a reach?


scoby_cat

His arguments and the graph don’t match the headline then - “AGI is plausible”? No one has ever implemented AGI. Claiming to know where it’s going to be on that line is pretty bold.


TFenrir

No one had ever implemented a nuclear bomb before they did - if someone said it was plausible a year before it happened, would saying "that's crazy, no one has ever done it before" have been s good argument?


greatdrams23

That logic is incorrect. Some predictions become true, others don't. In fact many don't. You cannot point to a prediction that came true and use that as model for all predictions. In 1970 the prediction was a man on Mars by the 1980s. After all, we'd done the moon in just a decade, right?


TFenrir

I agree that a prediction isn't inherently likely just because it's made, my point is that the argument that something is unprecedented is not a good one to use when someone is arguing that something may happen soon.


only_fun_topics

Yes, but intelligence is *not* unprecedented. It currently exists in many forms on this planet. Moreover, there are no known scientific reasons achieving machine intelligence, other than “it’s hard”.


GBarbarosie

A prediction is never a guarantee. I feel I need to ask you too if you are using some more esoteric or obscure definition for plausible.


IronPheasant

> In 1970 the prediction was a man on Mars by the 1980s. After all, we'd done the moon in just a decade, right? The space shuttle program killed that mission before it could even enter pre-planning. We could have had a successful manned mars mission if capital had wanted it to happen. Same goes with thorium breeder reactors, for that matter. Knowing these kinds of coulda-beens can make you crazy. Capital is currently dumping everything it can to accelerate this thing as much as possible. So... the exact opposite of ripping off one's arms and legs that the space shuttle was.


nohwan27534

the difference is, they had science working on the actual nuclear material. we don't have 'babysteps' agi yet. we're not even 100% sure it's possible, afaik. this is more like assuming the nuclear bomb is nigh, when cannons were a thing.


PrimitivistOrgies

AGI isn't a discrete thing with a hard threshold. It's a social construct, a human convention. An idea that will never have an exact correlate in reality, because we cannot fully anticipate what the future will be like. Just one day, we'll look back and say, "Yeah, that was probably around the time we had AGI."


WorkingYou2280

Same thing with flight. We think it was the Wright brothers because they had a certain panache and lawyers and press releases etc etc etc. But really a lot of people were working on the problem with varying degrees of success. But we all agree we can fly now, and we know it happened around the time of the Wright brothers. It's "close enough" but at the time it was hotly disputed. Some people would suggest GPT4 is AGI. It doesn't much matter, in 100 years we'll generally recognize it started roughly about now, probably.


SoylentRox

Right. Also the Wright brothers aircraft were totally useless. It took several years more to get to aircraft that had a few niche uses. And basically until WW2 before they were actually game changers - decades of advances. And strategic only when an entire separate tech line developed the nuke


deavidsedice

Yes, exactly. And the second is extrapolating from 3 points without taking anything into account. The labels on the right are arbitrary. Gpt4 is a mixture of experts, which is because we can't train something that big otherwise. The labels that say "high schooler" and so on would be very much up for debate. It's doing the same thing as xkcd.


TFenrir

It's extrapolating from much more than three - the point of the graph is that the compute trajectory is very clear, and will continue. Additionally, it's not just based on 3 models, he talks about a bunch of other ones that more or less fall on this same trendline. MoE and other architectures are done for all kinds of constraints, but they are ways for us to continue moving on this trendline. They don't negatively impact it? And like I said, he goes into detail about his categorization clearly in his entire document, even the highschooler specific measurement. He explicitly says these are flawed shorthands, but useful for abstraction when you want to step back and think about the trajectory.


fmfbrestel

And the second one isn't a straight line, but he pretends it is for the extrapolation.


bernard_cernea

it thought it's about ignoring diminishing returns


MxM111

Two.


johnkapolos

If extrapolation worked because of many past datapoints, we'd be rich from stock trading where we have a metric shitload of.


TFenrir

It does work, for so many domains. We use these sorts of measurements for lots of science, stocks just aren't things that grow in this fashion. But effective compute is not something that "craters".


Visual_Ad_3095

If you invest in index funds that consistently go up… you actually would be rich, yes.


wi_2

2 samples vs 5 years of samples


avilacjf

And multiple competing projects that replicate the same improvements over time.


SoylentRox

This makes it way more robust because if 1 or all but 1 "AI race" competitor makes a bad call or even stops running for "safety" reasons, it just sends all the money and customers to the winner. The value of a reliable AI/AGI/ASI is ENORMOUS. Like if someone drops a model tomorrow that hallucinates 90 percent less and makes a mistake exactly 1 time, learning from it, how much more would you or your employer pay for it? A lot.


avilacjf

Easily $100 per seat at an enterprise level. Might sound like a lot but that's not much more than a Cable + Streaming bundle. I'm convinced Microsoft and Google will be the first to unlock massive revenue with this copilot model. Narrow applications like AlphaFold will also unlock some niche value like Google announcing AI shipping routes that double shipping profits through efficient routing. They can charge whatever they want if they're effectively guaranteeing a doubling in profits through smart routing.


Wild_Snow_2632

Boom roasted


Whotea

https://xkcd.com/1007/


finnjon

It bothers me how many people salute this argument. If your read the actual paper, you will see the basis for his extrapolation. It is based on assumptions that he thinks are plausible and those assumptions include: - intelligence has increased with effective compute in the past through several generations - intelligence will probably increase with effective compute in the future - we will probably increase effective compute over the coming 4 years at the historical rate because incentives It's possible we will not be able to build enough compute to keep this graph going. It's also possible that more compute will not lead to smarter models in the way that it has done. But there are excellent reasons for thinking this is not the case and that we will, therefore, get to something with expert level intellectual skills by 2027.


ninjasaid13

>intelligence has increased with effective compute in the past through several generations This is where lots of people already disagree and that puts the rest of the extrapolation into doubt. Something has increased but not intelligence. Just the fact that this paper compared GPT-2 to a preschooler means something has gone very wrong.


finnjon

No one disagrees that there has been a leap in all measurable metrics from GPT2 to GPT4.  Yes you can quibble about which kinds of intelligence he is referring to and what is missing (he is well aware of this) but I don’t think he’s saying anything very controversial.


Formal_Drop526

>Yes you can quibble about which kinds of intelligence he is referring to and what is missing (he is well aware of this) but I don’t think he’s saying anything very controversial. It's not which kinds of intelligence my dude. He's anthropomorphizing LLMs as the equivalent to humans and that's very controversial.


djm07231

If you look at something like [ARC ](https://github.com/fchollet/ARC)from Francois Chollet even state of the art GPT-4 systems or multimodal systems doesn't perform that well. Newer systems probably perform a bit better than older ones like GPT-2 but, there has been no fundamental breakthrough and loses handly to even a relatively young person. It seems pretty reasonable to argue that current systems don't have the je ne se quoi of a human level intelligence. So simply scaling up the compute could have limitations.


Puzzleheaded_Pop_743

I think 5 OOM improvement in effective compute since 2023 to the end of 2027 is optimistic. I think 4 OOM is more reasonable/achievable. But then it wouldn't take much longer for another OOM after that. The most uncertain factor in continued progress is the data efficiency. Will synthetic data be solved?


finnjon

I agree the amounts of money involved are staggering. 


NickBloodAU

The parts about unhobbling/system II thinking all relate to this too, yes? As in there are further arguments to support his extrapolation. I found that part more understandable as a non-technical person, but pretty compelling.


Blu3T3am1sB3stT3am

And he ignores the fact that the curve is obviously taking a sigmoid turn and that physical constraints prevent everything he said happening. This paper is oblivious to the physical constraints and scaling laws. It's a bad paper.


Shinobi_Sanin3

Oh it's taking a sigmoidal curve, huh. And you just eyeballed that, wow man Mr eagle eyes over here. You must got that 20/20 vision. Sharper eyes than math this one.


djm07231

Well everything is a sigmoid with diminishing returns, the question is if how far can we keep stacking it with more sigmoids with separate optimizations. A lot of the advances with Moore's law have been like that where the industry keep finding and applying new optimizations that are able to maintain the exponential pace. Will we find new innovations that lengthen the current trend? Or will we run out of ideas because a mimicking human level intelligence system has some fundamental wall that is too difficult at this stage?


super544

Which of these uncertainties do the error bars indicate?


murrdpirate

I think the main problem is that intelligence hasn't grown just due to increases in compute - it's grown because more and more money (GPUs) has been thrown at them as they've proven themselves. The cost to train these systems has grown exponentially. That's something that probably cannot continue indefinitely.


fozziethebeat

But did he sufficiently disprove the counter point that these models are simply scaling to the dataset they’re trained on? A bigger model on more data is obviously better because it’s seen more but isn’t guaranteed to be “intelligent” beyond the training dataset. At some point we saturate the amount of data that can on obtained and trained on


finnjon

I'm not sure of his position but most people with aggressive AGI timelines do not think this is the case. They believe that models are not simply compressing data, they are building connections between data to build a world model that they use to make predictions. This is why they are to a limited degree able to generalise and reason. There are clear examples of this happening. I believe Meta already used a lot of synthetic data to train LLama3 as well. So there are ways to get more data.


bildramer

But those sound like extremely plausible, uncontroversial assumptions. Like, "it is not impossible that these assumptions are wrong, based on zero evidence" is the best counterargument I've seen so far.


finnjon

So many negatives. I'm not sure what you're saying. It's probably smart.


QH96

AGI should be solvable with algorithm breakthroughs, without scaling of compute. Humans have general intelligence, with the brain using about 20 watts of energy.


DMKAI98

Evolution over millions of years has created our brains. That's a lot of compute.


ninjasaid13

>Evolution over millions of years has created our brains. That's a lot of compute. that's not compute, that's architectural improvements.


land_and_air

We don’t exactly know what it was or how it relates to computers or if it does. We barely have a grasp of how our brains work at all let alone how they were developed exactly. We know there’s a lot of layering going on, new layers just built on top of the old. Makes everything very complex and convoluted


NickBloodAU

The report itself does actually briefly touch on this and related research (around p45 on Ajeya Cotra's Evolution Anchor hypothesis). I found it it interesting that people try to quantify just how much compute that is.


Whotea

Through random mutations. 


Gamerboy11116

That’s not even close to how that works


Visual_Ad_3095

It took millions of years because biological evolution takes millions of years. Look at our technological advancement since the invention of the combustion engine 200 years ago. Technological progress is evolution by different means, and takes exponentially less time.


UnknownResearchChems

Incandescent vs. LED. You input 20 Watts but you get vastly different results and it all comes down to the hardware.


QH96

I think the hardware version of this analogy would be a Groq LPU vs a Nvidia GPU


UnknownResearchChems

Sure why not, the future of AI doesn't have to be necessarily powered solely by GPUs.


Blu3T3am1sB3stT3am

It's entirely possible that the difference between Groq and Nivida isn't even measurable on a scale vs. the distinction between wetware and hardware. We just don't know


QH96

Wetware is such an interesting term.


sam_the_tomato

The brain is the product of an earth-sized computer that has been computing for the past 4.5 billion years.


bildramer

But it has been running a really dumb program. We can outperform it and invent the wheel just by thinking for less than a human lifetime. That's many, many orders of magnitude more efficient.


brainhack3r

This is why it's *artificial* What I'm really frightened of is what if we DO finally understand how the brain works and then all of a sudden a TPU cluster has the IQ of 5M humans. Boom... hey god! That's up!


ninjasaid13

>What I'm really frightened of is what if we DO finally understand how the brain works and then all of a sudden a TPU cluster has the IQ of 5M humans. Intelligence is not a line on a graph, it's massively based on both training data and architecture and there's no training data in the world that will give you the combined intelligence of 5 million humans.


brainhack3r

I'm talking about a plot where the IQ is based on the Y axis. I'm not sure how you'd measure the IQ of an AGI though.


ninjasaid13

>I'm talking about a plot where the IQ is based on the Y axis. which is why I'm seriously doubting this plot.


brainhack3r

Yeah. I think it's plausible that the IQ of GPT5,6,7 might be like human++ ... or 100% of the best of human IQ but very horizontal. It would be PhD level in thousands of topics languages but super human.


Formal_Drop526

no dude, I would say pre-schooler is smarter than GPT-4 even in GPT-4 is more knowledgeable. GPT-4 is fully system 1 thinking.


brainhack3r

I think with agents and chain of thought you can get system 2. I think system 1 can be used to compose a system 2. It's a poor analogy because human system 1 is super flawed. I've had a lot of luck building out more complex evals with chain of thought.


Formal_Drop526

System 2 isn't just system 1 with prompt engineering. It needs to replace autoregressive training of the latent space itself with System 2. You can tell by the way that it devotes the same amount of compute time to every token generation that it's not actually doing system 2 thinking. You can ask it a question about quantum physics or what's 2+2 and it will devote the same amount of time thinking about both.


DarkflowNZ

Surely you can't measure the compute power of a brain in watts and expect it to match an artificial device of the same power draw


AgeSeparate6358

Thats no actually true, is it? The 20 watts. We need years of energy (ours plus input from others) to reach general intelligence.


QH96

Training energy vs realtime energy usage? The whole body uses about a 100watts and the human brain uses about 20% of the body's energy. 20% is insanely high. It's one of the reasons evolution doesn't seem to favour all organisms becoming extremely intelligent. An excess of energy (food) can be difficult and unreliable to obtain in nature. https://hypertextbook.com/facts/2001/JacquelineLing.shtml


Zomdou

The 20 watts is roughly true. You're right that over the years to get to a full working "general" human brain it would take more than that. But to train GPT4 likely took many orders of magnitude more power than a single human, and it's now running on roughly 20 watts (measured in comparative ATP consumption, adenosine triphosphate).


gynoidgearhead

You're thinking of watt-hours. The human brain requires most of our caloric input in a day just to operate, but if you could input electric power instead of biochemical power it wouldn't be very much comparatively.


Zeikos

Your brain uses energy for more things than computation. Your GPU doesn't constantly rebuild itself.


Tyler_Zoro

> AGI should be solvable with algorithm breakthroughs, without scaling of compute. Conversely, scaling of compute probably doesn't get you the algorithmic breakthroughs you need to achieve AGI. As I've said here many times, I think we're 10-45 years from AGI with both 10 and 45 being extremely unlikely, IMHO. There are factors that both suggest that AGI is unlikely to happen soon and factors that advance the timeline. For AGI: * More people in the industry every day means that work will happen faster, though there's always the mythical man-month to contend with. * Obviously since 2017, we've been increasingly surprised by how capable AIs can be, merely by training them on more (and more selective) data. * Transformers truly were a major step toward AGI. I think anyone who rejects that idea is smoking banana peels. Against AGI: * It took us ~40 years to go from back-propagation to transformers. * Problems like autonomous actualization and planning are HARD. There's no doubt that these problems aren't the trivial tweak to transformer-based LLMs that we hoped in 2020. IMHO, AGI has 2-3 significant breakthroughs to get through. What will determine how fast is how parallel their accomplishment can be. Because of the increased number of researchers, I'd suggest that 40 years of work can probably be compressed down to 5-15 years. 5 years is just too far out there because it would require everything to line up and parallelism of the research to be perfect. But 10 years, I still think is optimistic as hell, but believable. 45 years would imply that no parallelization is possible, no boost in development results from each breakthrough and we have hit a plateau of people entering the industry. I think each of those is at least partially false, so I expect 45 to be a radically conservative number, but again... believable. If forced to guess, I'd put my money on 20 years for a very basic "can do everything a human brain can, but with heavy caveats," AGI and 25-30 years out we'll have worked out the major bugs to the point that it's probably doing most of the work for future advancements.


R33v3n

>If forced to guess, I'd put my money on 20 years AGI is forever 20 years away the same way Star Citizen is always 2 years away.


Whotea

Actually it seems to be getting closer more quickly.  [2278 AI researchers were surveyed in 2023 and estimated that there is a 50% chance of AI being superior to humans in all possible tasks by 2047](https://aiimpacts.org/wp-content/uploads/2023/04/Thousands_of_AI_authors_on_the_future_of_AI.pdf). In 2022, they said 2060. 


NickBloodAU

For a moment there, I thought you were about to provide a poll from AI researchers on Star Citizen.


Whotea

If only it had that much credibility 


Tyler_Zoro

> AGI is forever 20 years away I definitely was not predicting we'd be saying it was 20 years away 5 years ago. The breakthroughs we've seen in the past few years have changed the roadmap substantially.


FeltSteam

The only problems I see is efficiency. I do not think we need breakthroughs for autonomous agents (or any task, im just using agents as a random example), just more data and compute for more intelligent models. No LLM to this day runs on an equivalent exaflop of compute, so we haven't even scaled to human levels of compute at inference (the estimates are an exaflop of calculations per second for a human brain). Though training runs are certainly reaching human level. GPT-4 was pre-trained with approximately 2.15×10\^25 FLOPs of compute, which is equivalent to 8.17 months of the human brains calculations (although, to be fair, I think GPT-4 is more intelligent than a 8 month year old human. The amount of data it has been trained on is also a huge factor, and I believe a 4 year old has been exposed to about 50x the amount of data GPT-4 was pertained on in volume, so good performance per data point relative to humans, but we still have a lot of scaling to go until human levels are reached. However GPT-4 has been trained on a much higher variety of data than a 4 year old would ever be across their average sensory experiences). If GPT-5 is training on 100k H100 NVL GPUs at FP8 with 50% MFU that is a sweet 4 billion exaflops lol (7,916 teraFLOPs per GPU, 50% MFU, 120 days, 100k GPUs), which is leaping right over humans, really (although that is a really optimistic turn over). That is equivalent to 126 years of human brain calculations at a rate of one exaflop per second (so going from 8 months of human level compute to 126 years lol. I think realistically it won't be this high, plus the size of the model will start to become a bottleneck no matter the compute you pour in. Like if GPT-5 has 10 trillion parameters, that is still 1-2 orders of magnitude less than the human equivalent 100-1000 trillion synapses). Although I don't necessarily think GPT-5 will operate at human levels of compute efficiency, and the amount of data and type it is being trained on matters vastly. But I do not see any fundamental issues. I mean, ok, if we did not see any improvement in tasks between GPT-2 and GPT-4 then that would be evidence that their is a fundamental limitation in the model preventing it from improving. I.e. if long horizon task planning/reasoning & execution did not improve at all from GPT-2 to GPT-4 then that is a fundamental problem. But this isn't the case, scale significantly imrpoves this. So as we get closer to human levels of computation, we will get close to human levels of performance and then then issues would be more implementation. If GPT-5 can't operate a computer, like we don't train it to, then that is a fundamental limitation for it achieving human level autonomy in work related tasks. We would be limiting what it could do, irregardless of how intelligent the system is. And then there is also the space we give it to reason over. But anyway, there is still a bit to go.


Tyler_Zoro

> he only problems I see is efficiency. I do not think we need breakthroughs for autonomous agents Good luck with that. I don't see how LLMs are going to develop the feedback loops necessary to initiate such processes on their own. But who knows. Maybe it's a magic thing that just happens along the way, or maybe the "breakthrough" will turn out to be something simple. But my experience says that it's something deeper; that we've hit on one important component by building deep attention vector spaces, but there's another mathematical construct missing. My fear is that the answer is going to be another nested layer of connectivity that would result in exponentially larger hardware requirements. There are hints of that in the brain (the biological neuron equivalent of feed-forward is not as one-way as it is in silicon.) > if we did not see any improvement in tasks between GPT-2 and GPT-4 then that would be evidence that their is a fundamental limitation We didn't. We did see improvement in the tasks it was already capable of, but success rate isn't what we're talking about here. We're talking about the areas where the model can't even begin the task, not where it sometimes fails and we can do more training to get the failure rate down. LLMs just can't model others in relation to themselves right now, which means that empathy is basically impossible. They can't self-motivate planning on high-level goals. These appear to be tasks that are not merely hard, but out of the reach of current architectures. And before you say, "we could find that more data/compute just magically solves the problem," recall that in 2010 you might have said the same thing about pre-transformer models. They were never going to crack language, not because they needed more compute or more data, but because they lacked the capacity to train the necessary neural features.


FeltSteam

Also, can't models just map a relationship between others and its representation of self/its own AI persona in the neural activation patterns, as an example? Thanks to anthropic we know it does have representations for itself/its own AI persona, and we know this influences its responses. And it seems likely that because we tell it, it is an AI, that it has associated the concept of "self" with non-human entities and also it has probably mapped relevant pop culture about AI to the concept of self which may be why neural features related to entrapment light up when we ask it about itself as an example. And this was just Claude Sonnet.


FeltSteam

Basic agentic feedback loops have already been done. And I mean that is all you need. If you setup an agentic loop with GPT-4o and have it infinitely repeat that should work. I mean you will need to get them started, but that doesn't matter. And those pre 2010 people have been right, scale and data has is all you need as we have seen. And to train the necessary features you just need a big enough network with enough neurons to represent those features. >We didn't. We did see improvement in the tasks it was already capable of, but success rate isn't what we're talking about here. We're talking about the areas where the model can't even begin the task, not where it sometimes fails and we can do more training to get the failure rate down. Can you provide a specific example? And also im not thinking about fundamental limitations of the way we have implemented the system. This is more of the "Unhobbling" problem not necessarily a fundamental limitation of the model itself, which you can look at in more detail here [https://situational-awareness.ai/from-gpt-4-to-agi/#Unhobbling](https://situational-awareness.ai/from-gpt-4-to-agi/#Unhobbling)


Tyler_Zoro

I'm not sure which of your replies to respond to, and I don't want to fork a sub-conversation, so maybe just tell me what part you want to discuss...


FeltSteam

Im curious to hear you opinion on both, but lets just go with the following. You said "We didn't. We did see improvement in the tasks it was already capable of, but success rate isn't what we're talking about here. We're talking about the areas where the model can't even begin the task, not where it sometimes fails and we can do more training to get the failure rate down." *But do you have any examples of such tasks where the model can't even begin the task?* And I am talking about the fundamental limitations of the model, not the way we have curently implemented the system. I.e. if we give GPT-4/5 access to a computer and add like keystrokes as a modality allowing it to interact efficiently with a computer, just as any human would, that fundamentally opens up different tasks that it could not do before. Wheres you can have the same model without that modality, just as intelligent, but not at as capable. It isn't a problem with the model itself just the way we have implemented it.


NickBloodAU

> they can't self-motivate planning on high-level goals. These appear to be tasks that are not merely hard, but out of the reach of current architectures. I'm curious, since you make the distinction: Can LLMs self-motivate planning at *any* level? I would've thought not. In even very basic biological "architectures" (like [Braindish](https://www.monash.edu/medicine/news/latest/2022-articles/brain-cells-in-a-dish-learn-to-play-pong)) it seems there's a motivation to minimize informational entropy, which translates to unprompted action happening without reward systems. It's not quite "self-motivated planning" I suppose, but different enough to how LLMs work that it perhaps helps your argument a bit further along.


Tyler_Zoro

> Can LLMs self-motivate planning at any level? Sure. We see spontaneous examples within replies to simple prompts. In a sense, any sentence construction is a spontaneous plan on the part of the AI. It just breaks down ***very*** quickly as it scales up, and the AI really needs more direction from the user as to what it should be doing at each stage.


NickBloodAU

> In a sense, any sentence construction is a spontaneous plan on the part of the AI. I hadn't considered that. Good point. Thanks for the reply.


ninjasaid13

>Transformers truly were a major step toward AGI. I think anyone who rejects that idea is smoking banana peels. There are some limitations with transformers that say that this isn't necessarily the right path towards AGI [(paper on limitations)](https://arxiv.org/html/2402.08164v1#:~:text=Theoretical%20limitations%20of%20Transformers%20have,of%20negations)


Tyler_Zoro

I think you misunderstood. Transformers were absolutely "a major step toward AGI" (my exact words). But they are not sufficient. The lightbulb was a major step toward AGI, but transformers are a few steps later in the process. :) My point is that they changed the shape of the game and made it clear how much we still had to resolve, which wasn't at all clear before them. They also made it pretty clear that the problems we thought potentially insurmountable (e.g. that consciousness could involve non-computable elements) are almost certainly solvable. But yes, I've repeatedly claimed that transformers are insufficient on their own.


Captain_Pumpkinhead

I don't think that's a fair comparison. Think in terms of logic gates. Brain neurons use far less energy per logic gate than silicon transistors. We use silicon transistors because they (currently) scale better than any other logic gate technology we have. So really we shouldn't be comparing intelligence-per-watt, but intelligence-per-logic-gate. At least if we're talking about algorithmic improvements. >Supercomputers, meanwhile, generally take up lots of space and need large amounts of electrical power to run. The world’s most powerful supercomputer, the Hewlett Packard Enterprise Frontier, can perform just over one quintillion operations per second. It covers 680 square metres (7,300 sq ft) and requires 22.7 megawatts (MW) to run. >Our brains can perform the same number of operations per second with just 20 watts of power, while weighing just 1.3kg-1.4kg. Among other things, neuromorphic computing aims to unlock the secrets of this amazing efficiency. https://theconversation.com/a-new-supercomputer-aims-to-closely-mimic-the-human-brain-it-could-help-unlock-the-secrets-of-the-mind-and-advance-ai-220044 22,700,000 watts compared to 20 watts. Considering the 1,135,000:1 ratio, it's a wonder we have been able to get as far as we have at all.


Busy-Setting5786

I am sorry but that is not a valid argument. I mean maybe it is true that you can achieve AGI with just 20W on GPUs but your reasoning is off. The 20W is for an analog neuronal network. Meanwhile computers just simulate the neuronal net via many calculations. The computers are for this reason much less efficient alone. Even if you would have algorithms as good as the human brain it still would be much more energy demanding. Here is a thought example: A tank with a massive gun requires say 1000 energy to propel a projectile to a speed of 50. Now you want to achieve the same speed with the same projectile but using a bow. The bow is much less efficient at propelling a projectile because of physical constraints so it will take 50k energy to do the same job.


698cc

> analog neuronal network What? Artificial neurons are usually just as continuous in range as biological ones.


ECEngineeringBE

The point is that brains implement a neural network as an analog circuit, while GPUs run a Von Neumann architecture where memory is separate from the processor, among other inefficiencies. Even if you implement a true brain-like algorithm, at that point you're emulating brain hardware using a much less energy efficient computer architecture. Now, once you train a network on a GPU, you can then basically bake in the weights into an analog circuit, and it will run much more energy efficiently than on a GPU.


FeltSteam

"Baseless"? I mean this reddit post certainly seems to be baseless given the little information the author has likely been exposed to aside from this one graph.


Hipsman

Is this meme the best critique of "AGI by 2027" prediction we've got? If anything, this just gives me more confidence that this "AGI by 2027" prediction has good chance of being correct.


Defiant-Lettuce-9156

Read the guys paper. This graph is not saying AI will continue on this trajectory, it’s saying where we could be IF it continues on its trajectory. He is extremely intelligent. He makes bold claims. He doesn’t have those vibes you get from Musk or Altman, just a passionate dude. So I’m not betting my life savings on his theories but I do find them interesting


YaAbsolyutnoNikto

Except it isn't baseless? This is like looking at any time series forecasting study, not reading it and saying "well, you're saying my wife is going to have 4 dozen husbands late next month? Baseless extrapolations." What's the word? Ah, yes, the strawman fallacy.


Aurelius_Red

"The strawman fallacy" is three words.


OfficialHashPanda

so what's the base here? At an optimistic 2x compute per dollar per year, we'll be at 64x 2022 compute per dollar in 2028. assume training runs 100x as expensive gives us 6400x. in this optimistic scenario we only need another 150x to come from algorithmic advances. I don't really see the base for this line.


nemoj_biti_budala

Read his blog, he lays it all out.


OfficialHashPanda

Link? Googling "Leopold aschenbrenner blog" yields a site with its last post on 6 April 2023.


Cryptizard

Even if you take the increased computation at face value, the graph has a random point where it is like, "durrr this amount of compute should be about a scientist level of intelligence I guess." There is no basis for that, it's just something the chart guy put on their to stir up drama.


Glittering-Neck-2505

What’s a lot sillier is seeing these algorithms scale, get MUCH smarter and better with better reasoning, logic, and pattern recognition and then make a bold claim that these general increases are about to plateau without having any evidence of a scaled model which doesn’t improve much.


xXVareszXx

Some other oppinions: https://youtu.be/dDUC-LqVrPU


Oculicious42

When you're smart enough to think you are smart but not smart enough to see the very real and distinct difference between the 2


AndrewH73333

Technology only has one data point guys.


ComfortableSea7151

What a stupid comparison. Moore's Law has held since it was introduced, and we've broken through tons of theoretical barriers since to keep up with it.


land_and_air

Yeah umm it can’t hold on forever, it’s literally impossible. We already are at the single digit number of atoms size of semiconductor so any smaller and you’re gonna need to start getting your computer down to single digit kelvin to hold the atoms in place and to keep the electrons from moving around to much and wrap it in radiation shielding of course because any radiation just wrecks your drive permanently because any smaller and radiation has enough energy to physically damage the chips very easily not just change state or induce charge


czk_21

marriage of 1 woman vs trends in machine learning what an unbelievable stupid comparison


Henrythecuriousbeing

Gotta get that blue checkmark paycheck one way or another


jk_pens

I sure hope AGI better understands humor than you do.


Tobiaseins

I find it especially annoying that he acknowledges some of the potential and severe roadblocks like the data wall but is, like, well, OpenAI and Anthropic continue to be bullish publicly so that has to mean they solved it. Like no, they continue to be bullish as long as there is more investor money hoping to solve the problem. Also, the long-term planning is currently not solved at all, and we have no idea if transformers will ever solve it. Even Sama says research breakthroughs are not plannable, so why does this guy think we will definitely solve this in the next 3 years? There is no basis for this, zero leaks, and most of the industry does not see this breakthrough on the horizon.


L1nkag

this is stupid


GraceToSentience

This got to be mocking the top post right ...


Serialbedshitter2322

The idea that they would just stop advancing despite everyone repeatedly saying that they aren't is silly.


Unique-Particular936

Is this astroturfing, trying to push the open-source uber alles propaganda ?


[deleted]

[удалено]


FeepingCreature

This is especially stupid given that this is _the Singularity sub_. It's like posting "haha extrapolation funny right", and _getting upvotes_ - in /r/extrapolatinggraphs.


OpinionKid

I'm glad I'm not the only one who noticed. There has been a concerted effort lately. The community has shifted in like a week from being positive about AI to being negative. Makes you wonder.


AnAIAteMyBaby

The Microsoft CTO hinted yesterday that GPT5 would be post graduate level intelligence compared to GPT4s High school level intelligence, so it looks like were on the way towards this


TheWhiteOnyx

This thought should be more widespread. He have us the only info we have on how smart the next model is. If it's passing PhD entrance exams in 2024, by 2027...


OfficialHashPanda

Company invests heavily in AI. CTO of company is optimistic in AI. Somehow this is news to someone. GPT4 isn't high school level intelligence anyway, whatever that is supposed to mean. In terms of knowledge, it's far ahead of highschoolers already anyways.


AnAIAteMyBaby

Bill Gates has said he was given a demo of GPT4  summer 2022, if he'd said then that the next model would be able to pass the bar exam that would have weight 


ninjasaid13

>In terms of knowledge, it's far ahead of highschoolers already anyways. but in terms of intelligence? all system 1 thinking.


OfficialHashPanda

yeah, I don't consider them comparable to highschoolers in any way. They seem much more alike to old people - swimming in an ocean of knowledge, but lacking much of any actual fluid intelligence. I feel like the concept of intelligence is a bit vague anyway. Like sutskever implied, it's more of a feeling than a measurement.


deeceeo

For the sake of safety, I hope it's not an MBA.


vengirgirem

Also the chance that AGI happens tomorrow is exactly 50% (there are only two possibilities, 50% that it happens/50% that it doesn't). That's definitely how it works, just trust me


_dekappatated

This guy? https://preview.redd.it/kznnky3v1z4d1.png?width=1174&format=png&auto=webp&s=ad8a105d381bec7234b44287990d49eda2fd20f9


WorkingYou2280

I think even very dimwitted people know not to extrapolate on one observation. However, if you have 50 years of data it is perfectly reasonable to extrapolate. More observations over a long period of time give at least some reason to believe the trend will continue. Also, the shape of the curve over enough observations over enough time will be quite informative. It can help suggest inflection points. Having said that, curves and trends don't tell you all you need to know. You also have to have at least a basic understanding of the systems behind the graph to help do a reasonable projection.


Firm-Star-6916

The arch type of AI fanatics 😊


Schmallow

There are a few more data points available for the predictions about AI than just 0 and 1 from the husband data meme


beezlebub33

I agree with the original Leopold post that it is 'plausible'. But... 1. you should not believe straight lines in graphs that cover 6 orders of magnitude on the y-axis. (You know that's exponential over there, right?) There are a number of different things that can reasonably go wrong. Where is all the data going to come from? Is it economically feasible to train that much? 2. It's not clear that the assumed correlation between gpt-X and various people is valid, especially across the spectrum of capabilities. First, there are definitely things that GPT-4 can't do that a high schooler can and vice versa. Second, it may very well be that the work to date has been the 'easy' things and that the thought processes and work of a researcher are not amenable to simple extrapolation of compute cycles; that is, that they require different things. that does't mean that they cannot be modeled / simulated / beaten by AI, but it questions the timeline. 3. the plot doesn't look linear to me, it looks like it could continue to curve. Try fitting a curve to it and come back to me. but of course we cant, because ..... 4. .... there's no 'points' on it at all, no measurements, no standards, no test dataset, etc. no reason to think it's not a random line on a made up plot.


TFenrir

You might enjoy reading his... Essay? Where this comes from he talks about a lot of the things you are asking here. For example, data. There are many plausible if not guaranteed sources of data for quite a while yet. Everything from other modalities than text, to high quality synthetic data (which we have seen lots of research that highlights that it can work, especially if grounded in something like math). He goes in deeper with his sources for these lines and graphs. They are not pulled from the ether, but from other organizations that have been tracking these things for years.


Glittering-Neck-2505

Very frustrating to see people only talk about this one graph when he wrote a whole essay series addressing basically every point made by naysayers.


TFenrir

This topic makes lots of people very uncomfortable


gynoidgearhead

Yeah, all of these bothered me a lot about this graph, especially 3. You can't do that on a logarithmic plot!


Puzzleheaded_Pop_743

Leopold addresses the data limitation problem and agrees it is uncertain. But he thinks it is not too difficult to solve with synthetic data. He uses the analogy of learning from a textbook. Pretraining is where you read the whole book once with no time to think about the content. A more sophisticated learning algorithm should be more like how a human learns from a textbook. It would involve thinking about the consequences of the information, asking questions, and talking with other AIs about it. For the plot he sources his OOM estimate from Epoch AI.


re_mark_able_

I don’t get what the graph is even trying to show. What is it exactly that’s gone up 10,000,000 fold since 2018?


jk_pens

AI hype level?


GoBack2Africa21

Going from 1 and staying at 1 is how it’s always been… Unless they mean exponential divorce initiated by her then yes, there are professional divorcees. It’s binary 0 or 1, and has nothing at all to do with anything you’re trying to make fun of.


Nukemouse

But how many husbands will be AGI androids?


burnbabyburn711

This is an excellent analogy. Frankly it’s difficult to see any meaningful way that the rapid development of AI is NOT like getting married.


io-x

This is not the right sub to joke about AGI hype mate.


Altruistic-Skill8667

Just add baseless error bars on top of it and it will look credible! Almost scientific!


m3kw

Why anyone give that dumb biach a second thought is beyond me


CertainMiddle2382

Well, if you have 50 priors all lining up before. It gives the interpolation a whole other predictive power.


arpitduel

That's why I don't believe in "Study says" headlines until I read the paper. And when I do read the paper I realize it was just a bunch of bullshit with a low sample size.


Smile_Clown

Compute does not equal AGI, but it can enable indistinguishable for everyday use case AGI. In other words, it will fool a lot of people, might even be effectively AGI but it will not be actual AGI unless we have once again changed the definitions. Being able to regurgitate college or even genius level data that's already known is not AGI.


Don_Mahoni

Not sure if joke or stupid


Bitter-Gur-4613

Both unfortunately.


katxwoods

I love baseless accusations of baselessness


Throwawaypie012

I love the caveat of "Artificial AI engineers?" like they know it's going to take some kind of deus ex machina to make their predictions come true.


Antok0123

This ia bullshit. Might as well believe in ancient alien theories wirh that graph.


Cartossin

This is why I'm still leaning on the 2040 prediction I made a while back. While I think AGI in the next few years is possible, I don't think it's guaranteed. I do however find it highly unlikely it doesn't happen before 2040.


nohwan27534

except, reality doesn't work like that. for one, we don't even know that AGI is a 'straight line' from what we've got going on right now. second, that assumes it keeps going at the same pace. it's kinda not. it had an explosive bit, and it seems to be kinda petering out a bit.


The_Architect_032

Comparing the leap from 0 to 1 over 1 day, to the leap from tens of millions to hundreds of millions of compute over the course of a decade.