AI has slowed (and we might be headed for an AI w…
Is AI scaling hitting diminishing returns?
Until 2018, frontier models' training compute grew 7x/year due to a 'compute overhang'; you could run small-scale ML experiments cheaply and deep learning scaled well on small GPU clusters.[1] Scaling data and compute, but not algorithms, drove most AI progress. While training methods and fine-tuning have evolved, the fundamental architecture of the GPT hasn't changed much.[2] The main 'algorithmic breakthrough' was using exponentially more training data.[3]
In small models (e.g. by Google, FB, Alibaba, Mixtral etc.), compute and capabilities (e.g. reasoning) correlated very strongly (~95%)[4]:
Source[5]
Since 2018, models have needed larger dedicated clusters and better hardware and frontier model growth slowed to 4x more compute per year.[6]
In 2022, OpenAI trained GPT-4 for ~$50M.[7],[8] When it came out two years ago in 2023, everyone was surprised how much smarter it was than GPT-3. This was reflected in big performance jumps on benchmarks for language understanding, math, science, and coding. Of course, all benchmarks are wrong models of intelligence and capabilities,[9] but some are still useful proxies for the intuitions that progress has been decelerating. While AI has gotten a bit better since, and we might find yet more ways to monetize it, we haven't seen as big a jump in capabilities as the one from GPT-3 to 4.[10] Scale used to be all we needed. The models just wanted to learn.
In 2023, Google trained Gemini 1 for ~$130M and 2.4x more compute than GPT-4.[11],[12] While it did slightly better on some benchmarks, Google has a lot of data and the most compute of any firm and can easily spend $1B to train an AI.[13] If scale was all you needed for a GPT-3 to 4 jump, why didn't Google just release a 10x bigger model? Hassabis cited both practical limits (e.g. how much compute can you actually fit in one data center), but also said that scaling more than 10x between models is suboptimal: you want to adjust 'the recipe' (e.g. intermediate data points to help course-correct the hyperparameter optimization).[14] And so, some say that 2023 was slowest year of AI progress for some time to come[15] and forecasts also suggest only a ~15% chance of next gen models plateauing.[16]
In 2024, performance no longer improved linearly, but sublinearly:[17]
In 2025, GPT-5 turned out to use less training compute than GPT-4.5 (but GPT-6 probably won't).[24] Many have been disappointed by the release (though see[25]).
And so, the scaling paradigm has hit diminishing returns, because generally, returns to scale past the point of increasing returns are often logarithmic.[26] As a rule of thumb, past the point of increasing returns to scale, which we've now reached, the next dollar spent on say compute, data or AI scientists at the, say, $1B mark get you 10x less than at the $100M funding mark.
By 2028, there might be 160x more compute per training than the 100K H100s of 2024, but it’ll require 5 GW of power and cost ~$140B[27]). While by 2030 we might train AI that is 10,000x bigger than GPT-4—the same increase from GPT-2 to 4[28]— this would cost on the order of of hundred of billions.[29]
Due to diminishing returns the money spent leading up to it is 1000x less effective than what we spent around the $100M mark (at which scaling has already stopped delivering). A fortiori, if we already see diminishing returns, then this will only get worse as the low hanging algorithmic improvements fruits are plucked.
There is a case to be made that AI hype is continuously moving the goalpost:
Economic data suggests that TAI is not near:
And so, in sum, AI might not improve as fast as before without feedback loops (e.g. AI improving itself) and bring about transformative AI in the near-term (i.e. a new, qualitatively different future as per Karnofsky's definition[45]).
Notes
Look at the axis
–
Ilya on Dwarkesh - scaling era is over
–
https://x.com/IntologyAI/status/1991186650240806940
Compute scaling will slow down due to increasing lead times
“Some foresee a positive feedback loop. These AIs are smart enough to find new algorithms to make smarter AIs, which make even smarter AIs, and so on. Very soon, we could see multiple years of AI progress compressed into a single year just through software advances — a “software intelligence explosion”.1
Others agree that AI progress would speed up, but think that something will block the explosive feedback loop. For example, increasing difficulty in finding new algorithms might bottleneck AI self-improvement, or software improvements might depend heavily on physical resources like compute, which can’t be scaled as easily.”[46]
When Will AI Transform the Economy? - by Andre Infante
Understanding AI Trajectories: Mapping the Limitations of Current AI Systems
“Aschenbrenner
Scaling provides this baseline enormous force of improvement. GPT-2 was amazing [for its time]. It could string together plausible sentences, but it could barely do anything. It was kind of like a preschooler. GPT-4, on the other hand, could write code and do hard math, like a smart high schooler. This big jump in capability is explored in my essay series.[47] I count the orders of magnitude of compute and algorithmic progress.
“Scaling alone, by 2027 or 2028, is going to do another preschool to high school-sized jump on top of GPT-4””
GPT-5 and GPT-4 were both major leaps in benchmarks from the previous generation | Epoch AI
AI #142: Common Ground - by Zvi Mowshowitz
https://substack.com/home/post/p-176150498
G comments:
GPT 4.5 was on trend?
Read: Scaling laws for board games
–
GPT-5 is no slowdown- by Shakeel Hashim and Jasper Jackson
Why AI's IMO gold medal is less informative than you think
Jacob_Hilton's Shortform — LessWrong
!!
!! Teaching AI to reason: this year's most important story
—
Inference
–
Will AI R&D Automation Cause a Software Intelligence Explosion? — LessWrong
The case for AGI by 2030- 80,000 Hours
'Are Ideas Getting Harder to Find?': across various sectors, research productivity is declining sharply while research effort is rising substantially1. Their analysis reveals that maintaining constant growth rates requires ever-increasing research inputs- a pattern observed across multiple domains.
Moore's Law provides a striking illustration of this phenomenon. The researchers calculate that achieving the famous doubling of computer chip density every two years now requires more than 18 times the number of researchers compared to the early 1970s. This translates to research productivity in semiconductor development declining at an average annual rate of 6.8 percent.
perplexity.ai/ review the economic growth literature on declining research productivity
The most important graph in AI right now: time horizon
This is a reasonable estimate of the economic output impact of a hypothetical technology that could be summarized as 'LLMs, but better'. However, that's not actually what we should expect from future AI systems at a time horizon of 10 years: we should expect them to become capable of performing many tasks that current AI systems cannot perform at all.
A Bear Case: My Predictions Regarding AI Progress — LessWrong
What AI can currently do is not the story- by Ege Erdil
leogao's Shortform — LessWrong
Have LLMs Generated Novel Insights? — LessWrong
the upfront capital expenditure required for a cluster to train a model is often 10x larger than the training cost of the model itself. This is because the amortized lifetime of the GPUs in a cluster is on the order of a few years, but training runs only take on the order of a few months. So a model whose training cost, measured in GPU hours times the cost per GPU hour, is $300M requires a capex of $3B or more.
On the perils of AI-first debugging-- or, why Stack Overflow still matters in 2025:: Giles' blog
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Caption: 'Horizon-length' autonomy score over time, credit to METR: GitHub - METR/eval-analysis-public: Public repository containing METR's DVC pipeline for eval data analysis
https://x.com/AGItechgonewild/status/1887244469570793919
Marginal cost of deep research last exam performance is very high
'And then we hit a wall.
Nobody expected it. Well... almost nobody. Yann LeCun posted his 'I told you so's' all over X. Gary Marcus insisted he'd predicted this all along. Sam Altman pivoted, declaring o3 was actually already ASI.
The first rumors of scaling laws breaking down were already circulating in late 2024. By late 2025, it was clear that test-time scaling was not coming to the rescue. Despite the scaling labs' best efforts, nobody could figure out how to generalize reasoning beyond the comfortable confines of formally verifiable domains. Sure, you could train reasoning models on math and a little bit on coding, but that was it — transformers had reached their limits.'
Math gamed. Arc based on vision
How much more energy efficient is the human brain currently than o3 in solving frontier math?
Implications of the inference scaling paradigm for AI safety — LessWrong
It’s getting harder to measure just how good AI is getting | Vox
—
'people who complain that LLM improvement has slowed are often missing the enormous advances in these multi-modal models. Being able to run prompts against images (and audio and video) is a fascinating new way to apply these models.'
RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
!!
https://x.com/GarrisonLovely/status/1866945509975638493
Chatbot Arena Leaderboard showing big jump from GPT-3.5[49]
“when ppl say there's not been much AI progress last year... The models have gone from random guessing to CAN ANSWER PHD SCIENCE QUESTIONS! Wtf” https://x.com/ben_j_todd/status/1866342037848768775
'This roughly aligns with my timeline. ARC will be solved within a couple of years.'
'Terrance Tao- Epoch benchmark will be solved in a few years'
!! https://x.com/JTLonsdale/status/1861844692750540912
Thanks to algorithmic progress (Epoch 2024), the amount of compute it takes to train a model of a particular performance halves every 1-3 years.
OpenAI, Google and Anthropic are struggling to build more advanced AI | Hacker News
https://x.com/lmarena_ai/status/1857110672565494098
Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims — LessWrong
https://x.com/GarrisonLovely/status/1856506960960729328
Scaling progress is constrained by the physical training systems[1]. The scale of the training systems is constrained by funding. Funding is constrained by the scale of the tech giants and by how impressive current AI is. Largest companies backing AGI labs are spending on the order of $50 billion a year on capex (building infrastructure around the world). The 100K H100s clusters that at least OpenAI, xAI, and Meta recently got access to cost about $5B. The next generation of training systems is currently being built, will cost $25-$40 billion each (at about 1 gigawatt), and will become available in late 2025 or early 2026.
Without a shocking level of success, for the next 2-3 years the scale of the training compute that the leading AGI labs have available to them is out of their hands, it's the systems they already have or the systems already being built. They need to make the optimal use of this compute in order to secure funding for the generation of training systems that come after and will cost $100-$150 billion each (at about 5 gigawatts). The decisions about these systems will be made in the next 1-2 years, so that they might get built in 2026-2027.
Thus paradoxically there is no urgency for the AGI labs to make use of all their compute to improve their products in the next few months. What they need instead is to maximize how their technology looks in a year or two, which motivates more research use of compute now, rather than immediately going for the most scale current training systems enable. One exception might be xAI, which still needs to raise money for the $25-$40 billion training system. And of course even newer companies like SSI, but they don't even have the $5 billion training systems to demonstrate their current capabilities unless they do something sufficiently different.
—-
fchollet 1 hour ago | root | parent | next [–]
This roughly aligns with my timeline. ARC will be solved within a couple of years.
There is a distinction between solving ARC, creating AGI, and creating an AI that would represent an existential risk. ARC is a stepping stone towards AGI, so the first model that solves ARC should have taught us something fundamental about how to create truly general intelligence that can adapt to never-seen-before problem, but it will likely not itself be AGI (due to be specialized in the ARC format, for instance). Its architecture could likely be adapted into a genuine AGI, after a few iterations-- a system capable of solving novel scientific problems in any domain.
Even this would not clearly lead to 'intelligence explosion'. The points in my old article on intelligence explosion are still valid-- while AGI will lead to some level of recursive self-improvement (as do many other systems!) the available evidence just does not point to this loop triggering an exponential explosion (due to diminishing returns and the fact that 'how intelligent one can be' has inherent limitations brought about by things outside of the AI agent itself). And intelligence on its own, without executive autonomy or embodiment, is just a tool in human hands, not a standalone threat. It can certainly present risks, like any other powerful technology, but it isn't a 'new species' out to get us.
—
Google DeepMind: ntroducing AlphaEvolve: a Gemini-powered coding agent for algorithm discovery.
It's able to:
Design faster matrix multiplication algorithms
Find new solutions to open math problems
Make data centers, chip design and AI training more efficient across @Google.
[2] A 61x increase in effective compute see Measuring the Algorithmic Efficiency of Neural Networks
[12] Epoch’s Gemini Ultra cost.ipynb
[25] Is AI Stalling Out? Cutting Through Capabilities Confusion, w/ Erik Torenberg, from the a16z Podcast
[27] https://www.lesswrong.com/posts/fdCaCDfstHxyPmB9h/vladimir_nesov-s-shortform?commentId=vpCuEPiJkwoDxhrqN
[39] The estimates use NVIDIA's stock price, which implies the GPU market will go quickly to $180B, and then increase at the average amount after that. Todd estimates costs for AI software firms and assumes they will have a 30% margin, which suggests they'll have revenues of $800B per year. If AI firms capture 25% of the value they create, that leaves a consumer surplus of $2.4T. The market expects AI software to create trillions of dollars of value by 2027
[40] The argument is as follows:
[44] Acemoglu - The simple macroeconomics of AI
[47] Aschenbrenner, “Situational Awareness.”