2 of 27

Plan

What has driven recent AI progress?
How much might AGI accelerate AI progress?
What risks could that pose?
How can labs reduce the risks?

3 of 27

Plan

What has driven recent AI progress?
How much might AGI accelerate AI progress?
What risks could that pose?
How can labs reduce the risks?

4 of 27

Drivers of progress

Growth of effective compute

More spending, cheaper compute, better algorithms

In the last decade (h/t Epoch):

Compute in the largest training runs increased by ~4X per year

~1.3X from compute getting cheaper
~3X from more spending on compute

Algorithms reduced compute requirements by ~2.5X per year

Progress towards AGI might be different

5 of 27

Drivers of progress

Fine-tuning: RLHF, Constitutional AI, Minerva
Prompting: chain-of-thought, few shot prompting
Tool-use: Toolformer, internet browsing
Scaffolding: reflexion, RETRO, LM agents
Runtime efficiency gains: quantization, flash attention

Individual enhancements often improve performance by more than increasing training compute by 5X (WIP).

Key takeaway: better software responsible for ~50% of recent progress.

6 of 27

Plan

What has driven recent AI progress?
How much might AGI accelerate AI progress?
What risks could that pose?
How can labs reduce the risks?

7 of 27

AI is beginning to accelerate AI progress

Co-pilot speeds up coding.
AI helps to design AI chips.
AI helps to compile code.

Future AI will accelerate things by more.

8 of 27

What changes when we have AGI?

Caveat: maybe AGI initially requires loads of runtime compute (e.g. BoN).

AGI = AI that can fully automate AI R&D.

Key difference: abundant cognitive labour.

Toy example:

AGI takes 3e27 FLOP to train.
Training lasts 4 months → training uses 3e20 FLOP/s
Chinchilla scaling → AGI uses 7e12 FLOP per forward pass

→ AGI performs 40 million forward passes per second with training compute

9 of 27

What is the effect of abundant cognitive labour?

Two contrasting perspectives:

More input → more output: big acceleration.

If cognitive labour drives 50% of progress, and cognitive labour increases 10X, progress increases by 5X

Bottlenecks: minor acceleration.

Computational experiments, humans-in-the-loop, safety
No amount of cognitive labour could accelerate progress by >2X

We don’t know which perspective is right.

10 of 27

Scary possibility: rapid software* progress without additional compute.

Software progress is 10X faster than it is today:

Efficiency of training algorithms improves ~4 OOMs in a year

Recent improvement has been ~0.4 OOMs per year (Epoch).
That’s a similar jump as GPT-2 → GPT-4

Each OOM of runtime efficiency expands your AGI workforce by 10X!
Ten years of gains from scaffolding, data, prompting, and tool-use!

*I use “software” to include all non-compute sources of progress: algorithms, data, prompting, scaffolding, tool-use…

11 of 27

Scary possibility: rapid software* progress without additional compute.

Three objections:

Diminishing returns – software progress becomes increasingly difficult.
Re-training takes many months.
Computational experiments take weeks or months.

12 of 27

Objection 1: diminishing returns to software progress

13 of 27

Objection 1: diminishing returns to software progress

Data from ImageNet efficiency improvements (h/t Epoch)

14 of 27

Objection 2: re-training takes many months.

Can “swallow” 0.5 OOMs of software progress to reduce training time from 3 months to 1 month

You et al. (2019) train BERT in 76 mins, down from 3 days (50X reduction)

Existing sources of improvement don’t require retraining from scratch:

Fine-tuning
Scaffolding
Prompting
Tool-use
Efficiency gains (e.g. quantization, flash attention)

15 of 27

Objection 3: computational experiments take weeks or months

Improve experiment design, execution, and interpretation.
Run experiments at smaller scales and for less long.

esp for efficiency improvements.

Existing sources of improvement don’t require large experiments (fine-tuning, scaffolding, prompting, etc.)
New sources of improvement

Leveraging interpretability for gradient updates.
Personalised curricula.
Reward for intermediate thoughts.

16 of 27

Plan

What has driven recent AI progress?
How much might AGI accelerate AI progress?
What risks could that pose?
How can labs reduce the risks?

17 of 27

We should proceed very cautiously around AGI

Dangerous capabilities

It’s hard to predict what dangerous capabilities new models will have.
More capable models are more dangerous.
Superhuman AI may have extremely dangerous capabilities.

Alignment

We can’t reliably control the behaviour of today’s AI systems.
We don’t have a scalable alignment solution.
Significant novel alignment challenges may emerge for superhuman AI.

18 of 27

AI acceleration could make it hard to proceed cautiously

A lab might behave irresponsibly.
A lab might worry that some other lab is behaving irresponsibly.
A bad actor might steal the weights and then behave irresponsibly.
A lab might worry about #3…

→ General destabilisation.

19 of 27

Threat models

A bad actor develops superhuman AI → creates a bio weapon
An irresponsible lab develops misaligned superintelligence → global catastrophe
A lab develops aligned superintelligence → gets undemocratic amounts of power
AI advances too quickly for society to adjust → massive societal unrest

20 of 27

Plan

What has driven recent AI progress?
How much might AGI accelerate AI progress?
What risks could that pose?
How can labs reduce the risks?

21 of 27

Early warning signs

Take protective measures.

Forecast software progress after deploying new model

Measure historical rate of software progress

Evals on AI R&D tasks

RCTs + surveys

How much will a new model accelerate software progress?

22 of 27

Early warning signs – when should labs pause?

Either:

AI has already doubled the pace of software progress

What? In <18 months the lab makes software progress that would have taken 3 years at the rate of progress observed in 2020-23.
How can we tell? Measure the pace of ongoing software progress.
RCTs + surveys can provide evidence that acceleration is due to AI.

AI can autonomously perform a wide range of AI R&D tasks.

What? Enough R&D tasks that the lab can’t confidently rule out AI soon significantly accelerating AI progress.
How can we tell? Build an eval of AI capabilities on a range of AI R&D tasks.

23 of 27

Protective measures

External oversight: the lab must get sign-off on high stakes decisions (e.g. rapidly improving their AI capabilities) from a responsible 3rd party .
Info security that is robust to state actors and to the lab’s AIs.
Alignment, boxing, and internal monitoring to prevent AI poisoning AI development.
Prioritise alignment automation: e.g. a commitment to use >2X as much inference compute to solve AI alignment as to improve AI capabilities.
Better measurement: comprehensive measurements and forecasts of the pace of software progress.
AI development speed limits: an RSP that explicitly describes how fast the lab could safely improve their AI capabilities, with their current protective measures.

24 of 27

What should labs do today?

State that AI that significantly accelerates their AI progress could be dangerous.

Dangerous capability threshold: AI that could accelerate software progress by >=3X.

Monitor for two early warning signs:

Build an eval for whether AI can autonomously perform a wide range of AI R&D tasks.
Measure whether the pace of software progress has accelerated by 2X

Prepare protective measures to handle that dangerous capability.
Commit to pause if you they either warning sign before they have protective measures.

This is a responsible scaling policy for the risk of AI accelerating AI progress. More.

25 of 27

What should labs do today?

Why now?

Algorithmic progress is already very fast.
AI is differentially better at coding compared to many other activities.
If GPT-6 is AGI, GPT-5 will probably cause significant acceleration.

26 of 27

Is this just like the other dangerous capability evals?

Key differences:

Evals are hard: how will GPT-5 be integrated into the coding workflow?

Eval for significant acceleration might be easier.

You can forecast risk in other ways: measure software progress, RCTs.
This capability is particularly “dual use”.
External oversight is more core to the solution.

Illegitimate for a lab to gain massive amounts of power.
Mutual reassurance that no lab is dangerously racing ahead.

1 of 27

2 of 27

3 of 27

4 of 27

5 of 27

6 of 27

7 of 27

8 of 27

9 of 27

10 of 27

11 of 27

12 of 27

13 of 27

14 of 27

15 of 27

16 of 27

17 of 27

18 of 27

19 of 27

20 of 27

21 of 27

22 of 27

23 of 27

24 of 27

25 of 27

26 of 27

27 of 27