Published using Google Docs
AI scaling hits diminishing returns
Updated automatically every 5 minutes

AI has slowed (and we might be headed for an AI w…

Is AI scaling hitting diminishing returns?

Until 2018, frontier models' training compute grew 7x/year due to a 'compute overhang'; you could run small-scale ML experiments cheaply and deep learning scaled well on small GPU clusters.[1] Scaling data and compute, but not algorithms, drove most AI progress. While training methods and fine-tuning have evolved, the fundamental architecture of the GPT hasn't changed much.[2] The main 'algorithmic breakthrough' was using exponentially more training data.[3] 

In small models (e.g. by Google, FB, Alibaba, Mixtral etc.), compute and capabilities (e.g. reasoning) correlated very strongly (~95%)[4]:

Source[5]

Since 2018, models have needed larger dedicated clusters and better hardware and frontier model growth slowed to 4x more compute per year.[6] 

In 2022, OpenAI trained GPT-4 for ~$50M.[7],[8] When it came out two years ago in 2023, everyone was surprised how much smarter it was than GPT-3. This was reflected in big performance jumps on benchmarks for language understanding, math, science, and coding. Of course, all benchmarks are wrong models of intelligence and capabilities,[9] but some are still useful proxies for the intuitions that progress has been decelerating. While AI has gotten a bit better since, and we might find yet more ways to monetize it, we haven't seen as big a jump in capabilities as the one from GPT-3 to 4.[10] Scale used to be all we needed. The models just wanted to learn.

In 2023, Google trained Gemini 1 for ~$130M and 2.4x more compute than GPT-4.[11],[12] While it did slightly better on some benchmarks, Google has a lot of data and the most compute of any firm and can easily spend $1B to train an AI.[13] If scale was all you needed for a GPT-3 to 4 jump, why didn't Google just release a 10x bigger model? Hassabis cited both practical limits (e.g. how much compute can you actually fit in one data center), but also said that scaling more than 10x between models is suboptimal: you want to adjust 'the recipe' (e.g. intermediate data points to help course-correct the hyperparameter optimization).[14] And so, some say that 2023 was slowest year of AI progress for some time to come[15] and forecasts also suggest only a ~15% chance of next gen models plateauing.[16]

In 2024, performance no longer improved linearly, but sublinearly:[17]

In 2025, GPT-5 turned out to use less training compute than GPT-4.5 (but GPT-6 probably won't).[24] Many have been disappointed by the release (though see[25]).

And so, the scaling paradigm has hit diminishing returns, because generally, returns to scale past the point of increasing returns are often logarithmic.[26] As a rule of thumb, past the point of increasing returns to scale, which we've now reached, the next dollar spent on say compute, data or AI scientists at the, say, $1B mark get you 10x less than at the $100M funding mark.

By 2028, there might be 160x more compute per training than the 100K H100s of 2024, but it’ll require 5 GW of power and cost ~$140B[27]). While by 2030 we might train AI that is 10,000x bigger than GPT-4—the same increase from GPT-2 to 4[28] this would cost on the order of of hundred of billions.[29]

Due to diminishing returns the money spent leading up to it is 1000x less effective than what we spent around the $100M mark (at which scaling has already stopped delivering). A fortiori, if we already see diminishing returns, then this will only get worse as the low hanging algorithmic improvements fruits are plucked.

There is a case to be made that AI hype is continuously moving the goalpost:

  1. 'Pretraining scaling is all you need' until it stopped delivering.
  2. Inference scaling was hailed as the new paradigm but is more expensive and on challenging questions which are outside a given base model's capabilities or under higher inference requirement, pretraining is likely more effective for improving performance.[30] 
  3. Agents were supposed to be the big thing of 2025 and create a revenue feedback loop in, but didn't deliver.
  4. Automating AI R&D has been hailed as the most recent approach.                                                                                                                        

Economic data suggests that TAI is not near:

  1. AI firms are not acting as if this can go on: AI is undoubtedly booming: firms now invest $175B per year in AI[31] and in 2023 funding for generative AI increased by 8x.[32] In 2025, AI investment might be $200B globally.[33] But if firms believed scaling would get us to TAI by 2030 or expected progress to continue at the same rate, they would train larger models. Instead, firms focus on lowering inference costs and adding new features,[34] (e.g. voice assistants,[35] models that can run locally,[36] and search.[37]), but don’t do much better on benchmarks.
  2. In industry, anecdotally, the excitement has calmed down. One AI expert who looked at the 900 most popular open source AI tools says: 'in early 2023, all AI conversations I had with companies centered around GenAI, but the recent conversations are more grounded.'[38]
  3. Stock markets also do not expect transformative AI soon: Based on stock market caps, AI might create $2.4T a year in consumer surplus.[39] At 2% of World GDP, that's big, but not TAI.
  4. Interest rates, too, do not imply TAI. Markets don't expect aligned or unaligned AGI in the next 50 years, because if they did, interest rates would be higher than they are now.[40]
  5. Macroeconomic models suggest gains from AI will be large, but not transformative: Most top economists think generative AI has a measurable impact on national innovation,[41] and many agree that AI will substantially boost our per capita income over the next 30 years—perhaps more than the internet.[42],[43] The recent Nobel prize winners' macroeconomic model uses estimates of how many tasks in a variety of jobs will be automated by AI and then estimates the productivity gains from this automation. They find that the upper bound on GDP growth due to AI is ~1.7% over 10 years.[44] This is even much less growth than what the stock market expects, though these models assume that no further innovation in AI will happen.

And so, in sum, AI might not improve as fast as before without feedback loops (e.g. AI improving itself) and bring about transformative AI in the near-term (i.e. a new, qualitatively different future as per Karnofsky's definition[45]).

Notes

Look at the axis

Ilya on Dwarkesh - scaling era is over

https://x.com/IntologyAI/status/1991186650240806940 

Compute scaling will slow down due to increasing lead times

Some foresee a positive feedback loop. These AIs are smart enough to find new algorithms to make smarter AIs, which make even smarter AIs, and so on. Very soon, we could see multiple years of AI progress compressed into a single year just through software advances — a “software intelligence explosion”.1

Others agree that AI progress would speed up, but think that something will block the explosive feedback loop. For example, increasing difficulty in finding new algorithms might bottleneck AI self-improvement, or software improvements might depend heavily on physical resources like compute, which can’t be scaled as easily.”[46]

When Will AI Transform the Economy? - by Andre Infante

Understanding AI Trajectories: Mapping the Limitations of Current AI Systems

“Aschenbrenner

Scaling provides this baseline enormous force of improvement. GPT-2 was amazing [for its time]. It could string together plausible sentences, but it could barely do anything. It was kind of like a preschooler. GPT-4, on the other hand, could write code and do hard math, like a smart high schooler. This big jump in capability is explored in my essay series.[47] I count the orders of magnitude of compute and algorithmic progress.

“Scaling alone, by 2027 or 2028, is going to do another preschool to high school-sized jump on top of GPT-4””

[48]

GPT-5 and GPT-4 were both major leaps in benchmarks from the previous generation | Epoch AI 

AI #142: Common Ground - by Zvi Mowshowitz

https://substack.com/home/post/p-176150498 

G comments:

GPT 4.5 was on trend?

Read: Scaling laws for board games

AI 2027 Tabletop Exercise 

GPT-5 is no slowdown- by Shakeel Hashim and Jasper Jackson 

Why AI's IMO gold medal is less informative than you think

Jacob_Hilton's Shortform — LessWrong 

!!

!! Teaching AI to reason: this year's most important story 

Inference

Will AI R&D Automation Cause a Software Intelligence Explosion? — LessWrong 

The case for AGI by 2030- 80,000 Hours 

'Are Ideas Getting Harder to Find?': across various sectors, research productivity is declining sharply while research effort is rising substantially1. Their analysis reveals that maintaining constant growth rates requires ever-increasing research inputs- a pattern observed across multiple domains.

Moore's Law provides a striking illustration of this phenomenon. The researchers calculate that achieving the famous doubling of computer chip density every two years now requires more than 18 times the number of researchers compared to the early 1970s. This translates to research productivity in semiconductor development declining at an average annual rate of 6.8 percent.

perplexity.ai/ review the economic growth literature on declining research productivity

The most important graph in AI right now: time horizon 

The Less Wrong/Singularity/AI Risk movement started in the 2000s by Yudkowsky and others, which I was an early adherent to, is wrong about all of its core claims around AI risk. It's important to recognize this and appropriately downgrade the credence we give to such claims'  

This is a reasonable estimate of the economic output impact of a hypothetical technology that could be summarized as 'LLMs, but better'. However, that's not actually what we should expect from future AI systems at a time horizon of 10 years: we should expect them to become capable of performing many tasks that current AI systems cannot perform at all.

A Bear Case: My Predictions Regarding AI Progress — LessWrong

What AI can currently do is not the story- by Ege Erdil

leogao's Shortform — LessWrong

Amdahl's law- Wikipedia

EpochAI flashcards

Have LLMs Generated Novel Insights? — LessWrong

the upfront capital expenditure required for a cluster to train a model is often 10x larger than the training cost of the model itself. This is because the amortized lifetime of the GPUs in a cluster is on the order of a few years, but training runs only take on the order of a few months. So a model whose training cost, measured in GPU hours times the cost per GPU hour, is $300M requires a capex of $3B or more.

On the perils of AI-first debugging-- or, why Stack Overflow still matters in 2025:: Giles' blog

Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training 

ImageCaption: 'Horizon-length' autonomy score over time, credit to METR: GitHub - METR/eval-analysis-public: Public repository containing METR's DVC pipeline for eval data analysis

https://x.com/AGItechgonewild/status/1887244469570793919 

Marginal cost of deep research last exam performance is very high

'And then we hit a wall.

Nobody expected it. Well... almost nobody. Yann LeCun posted his 'I told you so's' all over X. Gary Marcus insisted he'd predicted this all along. Sam Altman pivoted, declaring o3 was actually already ASI.

The first rumors of scaling laws breaking down were already circulating in late 2024. By late 2025, it was clear that test-time scaling was not coming to the rescue. Despite the scaling labs' best efforts, nobody could figure out how to generalize reasoning beyond the comfortable confines of formally verifiable domains. Sure, you could train reasoning models on math and a little bit on coding, but that was it — transformers had reached their limits.'

Google DeepMind CEO Demis Hassabis: The Path To AGI, Deceptive AIs, Building a Virtual Cell - Big Technology Podcast | Podcast on Spotify

Math gamed. Arc based on vision

How much more energy efficient is the human brain currently than o3 in solving frontier math?

Implications of the inference scaling paradigm for AI safety — LessWrong

It’s getting harder to measure just how good AI is getting | Vox 

'people who complain that LLM improvement has slowed are often missing the enormous advances in these multi-modal models. Being able to run prompts against images (and audio and video) is a fascinating new way to apply these models.'

Is AI progress slowing down?

RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts 

Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training 

!!  

https://x.com/GarrisonLovely/status/1866945509975638493

Chatbot Arena Leaderboard showing big jump from GPT-3.5[49]

“when ppl say there's not been much AI progress last year... The models have gone from random guessing to CAN ANSWER PHD SCIENCE QUESTIONS! Wtf” https://x.com/ben_j_todd/status/1866342037848768775

'This roughly aligns with my timeline. ARC will be solved within a couple of years.'

'Terrance Tao- Epoch benchmark will be solved in a few years'

!! https://x.com/JTLonsdale/status/1861844692750540912 

Thanks to algorithmic progress (Epoch 2024), the amount of compute it takes to train a model of a particular performance halves every 1-3 years.

OpenAI, Google and Anthropic are struggling to build more advanced AI | Hacker News

https://x.com/lmarena_ai/status/1857110672565494098 

Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims — LessWrong 

https://x.com/GarrisonLovely/status/1856506960960729328 

Scaling progress is constrained by the physical training systems[1]. The scale of the training systems is constrained by funding. Funding is constrained by the scale of the tech giants and by how impressive current AI is. Largest companies backing AGI labs are spending on the order of $50 billion a year on capex (building infrastructure around the world). The 100K H100s clusters that at least OpenAI, xAI, and Meta recently got access to cost about $5B. The next generation of training systems is currently being built, will cost $25-$40 billion each (at about 1 gigawatt), and will become available in late 2025 or early 2026.

Without a shocking level of success, for the next 2-3 years the scale of the training compute that the leading AGI labs have available to them is out of their hands, it's the systems they already have or the systems already being built. They need to make the optimal use of this compute in order to secure funding for the generation of training systems that come after and will cost $100-$150 billion each (at about 5 gigawatts). The decisions about these systems will be made in the next 1-2 years, so that they might get built in 2026-2027.

Thus paradoxically there is no urgency for the AGI labs to make use of all their compute to improve their products in the next few months. What they need instead is to maximize how their technology looks in a year or two, which motivates more research use of compute now, rather than immediately going for the most scale current training systems enable. One exception might be xAI, which still needs to raise money for the $25-$40 billion training system. And of course even newer companies like SSI, but they don't even have the $5 billion training systems to demonstrate their current capabilities unless they do something sufficiently different.

—-

fchollet 1 hour ago | root | parent | next [–]

This roughly aligns with my timeline. ARC will be solved within a couple of years.

There is a distinction between solving ARC, creating AGI, and creating an AI that would represent an existential risk. ARC is a stepping stone towards AGI, so the first model that solves ARC should have taught us something fundamental about how to create truly general intelligence that can adapt to never-seen-before problem, but it will likely not itself be AGI (due to be specialized in the ARC format, for instance). Its architecture could likely be adapted into a genuine AGI, after a few iterations-- a system capable of solving novel scientific problems in any domain.

Even this would not clearly lead to 'intelligence explosion'. The points in my old article on intelligence explosion are still valid-- while AGI will lead to some level of recursive self-improvement (as do many other systems!) the available evidence just does not point to this loop triggering an exponential explosion (due to diminishing returns and the fact that 'how intelligent one can be' has inherent limitations brought about by things outside of the AI agent itself). And intelligence on its own, without executive autonomy or embodiment, is just a tool in human hands, not a standalone threat. It can certainly present risks, like any other powerful technology, but it isn't a 'new species' out to get us.

Google DeepMind: ntroducing AlphaEvolve: a Gemini-powered coding agent for algorithm discovery.

It's able to:

 Design faster matrix multiplication algorithms

 Find new solutions to open math problems

 Make data centers, chip design and AI training more efficient across @Google.


[1] Training Compute of Frontier AI Models Grows by 4-5x per Year – Epoch AI 

[2] A 61x increase in effective compute see Measuring the Algorithmic Efficiency of Neural Networks 

[3] Training Compute-Optimal Large Language Models

[4] Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? 

[5] Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? 

[6] Training Compute of Frontier AI Models Grows by 4-5x per Year – Epoch AI 

[7]  Cost estimates for GPT-4.ipynb 

[8] The rising costs of training frontier AI models 

[9] Trendlines in AIxBio evals 

[10] How to Beat ARC-AGI by Combining Deep Learning and Program Synthesis 

[11] Machine Learning Trends – Epoch AI 

[12] Epoch’s Gemini Ultra cost.ipynb 

[13]  Epoch AI on X 

[14] Demis Hassabis - Scaling, Superhuman AIs, AlphaZero atop LLMs, Rogue Nations Threat 

[15] Aschenbrenner on X 

[16] Will AI capabilities plateau with the next generation (GPT-5, etc.) of language models? 

[17] Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind 

[18] Google plans to announce its next Gemini model soon  

[19] OpenAI and others seek new path to smarter AI as current methods hit limitations | Reuters 

[20] OpenAI and others seek new path to smarter AI as current methods hit limitations | Reuters 

[21] Google plans to announce its next Gemini model soon  

[22] OpenAI Board Forms Safety and Security Committee 

[23] Google plans to announce its next Gemini model soon  

[24] Why GPT-5 used less training compute than GPT-4.5 (but GPT-6 probably won’t) | Epoch AI 

[25] Is AI Stalling Out? Cutting Through Capabilities Confusion, w/ Erik Torenberg, from the a16z Podcast 

[26] Rescher's Principle of Decreasing Marginal Returns of Scientific Research 

[27]  https://www.lesswrong.com/posts/fdCaCDfstHxyPmB9h/vladimir_nesov-s-shortform?commentId=vpCuEPiJkwoDxhrqN 

[28] Can AI Scaling Continue Through 2030? – Epoch AI 

[29] Can AI Scaling Continue Through 2030? – Epoch AI 

[30] https://arxiv.org/pdf/2408.03314v1#page=10.71 

[31] Annual global corporate investment in AI

[32] AI Index Report 2024  

[33] AI investment forecast to approach $200 billion globally by 2025 | Goldman Sachs 

[34] OpenAI API pricing 

[35] GPT-4o

[36] GoogleDeepMind Nano

[37] OpenAI is readying an AI search product

[38] What I learned from looking at 900 most popular open source AI tools 

[39] The estimates use NVIDIA's stock price, which implies the GPU market will go quickly to $180B, and then increase at the average amount after that. Todd estimates costs for AI software firms and assumes they will have a 30% margin, which suggests they'll have revenues of $800B per year. If AI firms capture 25% of the value they create, that leaves a consumer surplus of $2.4T. The market expects AI software to create trillions of dollars of value by 2027 

[40] The argument is as follows:

  1. If we align AI, we'll earn more. There's diminishing utility to more income, so the more confident we are in AI, the more we'll front load consumption (e.g consume/borrow today) knowing that we don't need to save much to have a high income later. If we save less, interest rates go up.
  2. In an unaligned-AGI scenario, we'll die. The more confident we are in this scenario, the less we save, and instead they'll increase consumption/borrowing, as there's no use for savings if we all die. If we lend less, interest rates go up.

Transformative AI, existential risk, and asset pricing 

[41]         AI, Innovation and Society–Clark Center Forum

[42] AI and Productivity Growth - Clark Center Forum 

[43] AI and Productivity Growth 2 - Clark Center Forum 

[44] Acemoglu - The simple macroeconomics of AI 

[45] Transformative AI issues (not just misalignment): an overview 

[46] https://epoch.ai/gradient-updates/the-software-intelligence-explosion-debate-needs-experiments 

[47] Aschenbrenner, “Situational Awareness.”

[48] https://www.dwarkesh.com/p/leopold-aschenbrenner 

[49] Chatbot Arena Leaderboard - a Hugging Face Space by lmarena-ai