1 of 299

Risks from advanced AI

Videos covering some of the slides: part 1, part 2

2 of 299

  1. Many of these ideas deserve much more thought than a single slide can provide.
  2. Be skeptical: think carefully about why certain points might be misleading. Break claims down into subclaims. Propose and evaluate counterarguments.
  3. Engage with a wide array of perspectives before taking any impactful actions. Consider downsides and risks.

3 of 299

A lot of these slides are quoting from other articles and slideshows; see the speaker notes.

4 of 299

About me (Jakub Kraus)

  • Graduated in December (2022) from the University of Michigan with majors in math and data science
  • Ran the Michigan AI Safety Initiative (a student group at Michigan); see https://maisi.club/
  • Facilitated discussions for the AGISF online course
  • Facilitated discussions for the Intro to ML Safety online course
  • Researching and writing scripts for Rob Miles’s YouTube channel
  • Working as a research assistant on some AI governance projects
  • Reach out with questions: jakraus@umich.edu

5 of 299

6 of 299

7 of 299

Optimism

The first generation of AI researchers made these predictions about their work:

  • 1958, H. A. Simon and Allen Newell: "within ten years a digital computer will be the world's chess champion" and "within ten years a digital computer will discover and prove an important new mathematical theorem."[91]
  • 1965, H. A. Simon: "machines will be capable, within twenty years, of doing any work a man can do."[92]
  • 1967, Marvin Minsky: "Within a generation ... the problem of creating 'artificial intelligence' will substantially be solved."[93]
  • 1970, Marvin Minsky (in Life Magazine): "In from three to eight years we will have a machine with the general intelligence of an average human being."[94]

8 of 299

9 of 299

10 of 299

A Symbolics 3640 Lisp machine: an early (1984) platform for expert systems

11 of 299

12 of 299

“The business community's fascination with AI rose and fell in the 1980s in the classic pattern of an economic bubble.”

13 of 299

14 of 299

15 of 299

16 of 299

17 of 299

18 of 299

19 of 299

20 of 299

21 of 299

The Scaling Hypothesis

More compute + larger dataset + bigger network

= (?)

more powerful AI

22 of 299

A portrait photo of a kangaroo wearing an orange hoodie and blue sunglasses standing on the grass in front of the Sydney Opera House holding a sign on the chest that says Welcome Friends!

23 of 299

24 of 299

25 of 299

26 of 299

27 of 299

28 of 299

“Unfortunately, the UK public-sector currently has less than 1000 such top-spec GPUs, shared across all scientific fields. This means that one private lab in California is now using at least 25x the total compute capacity available through the entire UK state, just to train a single model.”

29 of 299

30 of 299

31 of 299

32 of 299

33 of 299

34 of 299

35 of 299

36 of 299

37 of 299

38 of 299

39 of 299

GPT-4: 86.4%

40 of 299

41 of 299

42 of 299

43 of 299

44 of 299

Problem #1: AI might advance extremely quickly.

→ less time to work on technical AI safety research (and other research)

→ less time to implement this research and appropriately adapt policies and institutional norms

45 of 299

Some definitions

  • Artificial General Intelligence (AGI): An AI system that is roughly as generally intelligent as a human.
  • Artificial Super Intelligence (ASI): An AI system that is much much smarter than a human along every axis.
  • Timelines: (roughly defined) how long between now and AGI
  • Takeoff Speed: (roughly defined) how long between AGI and ASI

46 of 299

Leading labs are building AGI:

47 of 299

48 of 299

49 of 299

50 of 299

51 of 299

52 of 299

53 of 299

54 of 299

55 of 299

56 of 299

57 of 299

58 of 299

59 of 299

A world with transformative AI could be wild

60 of 299

61 of 299

Isaac Asimov’s predictions about the future:

62 of 299

63 of 299

64 of 299

65 of 299

66 of 299

(Many caveats. And the solution is not to race ahead and “win.”)

67 of 299

68 of 299

69 of 299

6. If a political candidate for office says they believe the 2020 presidential election was stolen from Donald Trump, are you more likely to vote for that candidate, less likely to vote for that candidate, or doesn't it make a difference?

70 of 299

71 of 299

72 of 299

73 of 299

“[The technology of lethal autonomous drones], from the point of view of AI, is entirely feasible. When the Russian ambassador made the remark that these things are 20 or 30 years off in the future, I responded that, with three good grad students and possibly the help of a couple of my robotics colleagues, it will be a term project to build a weapon that could come into the United Nations building and find the Russian ambassador and deliver a package to him.”

– Stuart Russell

74 of 299

75 of 299

76 of 299

Median age of lawmakers in Jan 30th 2023: 57.9 years in House, 65.3 years in Senate

77 of 299

Policymakers are behind

78 of 299

Tricky incentive structures:

  • Big Tech: barrel ahead to reap economic rewards
  • Academia: publish or perish
  • Governments: support national interests; outcompete adversaries
  • Politicians: appeal to voters
  • Individuals: follow ethical maxims; pursue status, wealth, stability, happiness

79 of 299

80 of 299

Claim: Leading AI companies might not by default create powerful systems that are perfectly aligned with hard-to-specify human values.

81 of 299

82 of 299

“Alignment taxes”

Implementing alignment solutions can incur various costs:

  • Performance regressions
  • Development expenses (researcher time, compute costs, etc)
  • Longer time to deployment

These costs discourage actors from being cautious.

83 of 299

84 of 299

85 of 299

86 of 299

87 of 299

88 of 299

89 of 299

90 of 299

91 of 299

"We do not know, and probably aren't even close to knowing, how to align a superintelligence.

And RLHF is very cool for what we use it for today, but thinking that the alignment problem is now solved would be a very grave mistake indeed.”

– Sam Altman discussing GPT-4

92 of 299

Problem #2: racing to the bottom.

Facebook CEO Mark Zuckerberg onstage at the F8 conference 2014.

93 of 299

94 of 299

95 of 299

96 of 299

97 of 299

98 of 299

99 of 299

100 of 299

101 of 299

102 of 299

103 of 299

104 of 299

105 of 299

106 of 299

107 of 299

Imagine ExxonMobil releases a statement on climate change:

  • They talk about how preventing climate change is their core value.
  • They say that they’ve talked to all the world’s top environmental activists at length, listened to what they had to say, and plan to follow exactly the path they recommend.
  • So (they promise) in the future, when climate change starts to be a real threat, they’ll do everything environmentalists want, in the most careful and responsible way possible.
  • They even put in firm commitments that people can hold them to.

108 of 299

109 of 299

110 of 299

111 of 299

AI leaders might need to share information selectively

  • Seems dangerous for everyone on Earth to have access to an extremely powerful technology.
  • Some groups building advanced AI will be less cautious than others.

112 of 299

Trailing groups can take the lead

  • Some AI research organizations are incautious
  • Incautious actors can catch up to cautious actors.
  • If a cautious actor builds an extremely dangerous AI and chooses not to release it, this alone does nothing to stop a less cautious actor from causing catastrophe 6 months later.

113 of 299

Monitoring?

  • E.g. every AI developer must pass some reasonable set of tests for whether a system is dangerous
  • Monitoring might eventually require surveillance and international cooperation

114 of 299

Overall: companies and institutions might need to act in unprecedented ways.

  • Commercial incentives don’t always point in the right direction
  • We should avoid race dynamics
  • We might need to catch and stop “every case where someone is building a dangerous AI system
  • Previously helpful laws, norms, and institutions might swiftly become outdated as the technology progresses rapidly
  • Sometimes a lab might need to take actions from the “compete and race” frame to prevent less cautious labs from catching up; the situation is tricky.

115 of 299

116 of 299

Problem #0: future AI systems could be very powerful.

  • Seems reasonable to expect AI systems to eventually surpass human abilities in many domains.
  • Such systems might be able to reshape the world unlike any technology we’ve seen before.
  • This includes the ability to harm humans on a massive scale.

117 of 299

Different (fuzzy) definitions of intelligence

  • The thing that separates humans from chimps
  • Nerdy book smarts
  • IQ, EQ, creative, spatial, interpersonal, etc
  • Fluid and crystallized forms
  • Information processing speed
  • Human culture
  • General problem-solving ability
  • Wisdom
  • Consciousness

118 of 299

General problem-solving ability / competence factor:

  • Agents and intelligence
  • Cognitive power
  • Narrow vs general
  • Domain-general skills (imagination, planning, memory, reasoning, abstraction, learning)
  • Adapting to new circumstances
  • AI, AGI, ASI
  • Human civilization

119 of 299

Intuition pump for “cognitive power”:

Imagine millions of copies of Machiavelli+Einstein thinking at millions of times the speed at which humans ordinarily think, with an understanding of ~all existing human knowledge.

120 of 299

Alternative framing of the possible capabilities of future systems:

“How much influence is this system exerting on its environment?”

Too much influence will kill humans, if directed at an undesirable outcome.

121 of 299

Why machines might be able to surpass human intelligence:

  • Free choice of substrate enables improvements (e.g. signal transmission, cycles + operations per second, absorbing massive amounts of data very quickly)
  • Supersizing:” Machines have (almost) no size restrictions. If it requires C computational power to train an AGI (with a particular training setup), systems trained with 100 * C computational power will probably be substantially better.
  • Avoiding certain cognitive biases: e.g. confirmation bias; some argue that humans developed reasoning skills “to provide socially justifiable reasons for beliefs and behaviors”
  • Modular superpowers: Humans are great at recognizing faces because we have specialized brain structures for this purpose; an AI could have many such structures
  • Editability and copying: copying Jakub’s intelligence requires ~23 years, copying LLaMA requires a GPU cluster and an afternoon
  • Better algorithms? Evolution is the only process that has produced systems with general intelligence. And evolution is arguably much much slower than human innovation at its current rate. Also “first to cross the finish line” does not imply “unsurpassable upper bound”

122 of 299

Adam Magyar - Stainless, Alexanderplatz (excerpt)

�One advantage that extremely advanced AI systems will probably have over humans: speed.

123 of 299

124 of 299

How an AI system could gain power over humans:

  • Hack into software systems
  • Manipulate humans to do things
  • Get money
  • Empower destabilising politicians, terrorists, etc
  • Build advanced technologies
  • Self improve, or generally build better AI
  • Monitor humans with surveillance
  • Gain control over lethal autonomous weapons
  • Ruin the water / food / oxygen supply
  • Build or acquire WMDs

125 of 299

  • “The U.S. Department of Defense’s proposed budget for fiscal year 2021 includes $28.9 billion for modernizing the U.S. nuclear weapons complex”
  • “The largest program request in the modernization budget, totaling $7 billion, is … for modernizing the nation’s nuclear command, control, and communications (NC3) infrastructure”
  • “Virtually every aspect of the NC3 upgrade is expected to benefit from advances in AI and machine learning.”

126 of 299

With powerful AI systems, just one mistake might be enough to cause catastrophe.

This could occur if one actor is incautious or one AI system unexpectedly misehaves.

127 of 299

128 of 299

In short:

AI could harm the same humans who build it. And everyone else.

129 of 299

Claim: we should be cautious about creating any technology this powerful, and we should be skeptical by default of going “full steam ahead.”

Why won’t things go wrong? What does success look like, in detail, for a world with extremely powerful AI systems? What paths might take us from today’s world to that world? How likely are these paths? Why do we expect the “success” to last for a long time?

130 of 299

Ok, sure it might be powerful, but why would it “aim” to do any of these things?

131 of 299

The next few slides focus on AI misalignment x-risk

Short Term

Long Term

Accident/Failure

E.g. self-driving car crashes

AI misalignment x-risk

Misuse

E.g deep fakes

E.g. AI-enabled dictatorship

~by 2070

132 of 299

Claim: we might build some AI systems that are extremely competent, but don’t care much about human values

133 of 299

134 of 299

135 of 299

136 of 299

  • Rook behind passed: A bonus is awarded for a rook behind a passed pawn of the same color.
  • Xraying: A bonus is awarded for having an “xray attack”, i.e., an attack masked by one’s own piece.
  • Permanent pin and simple hung. A bonus is awarded for a permanent pin of a piece.
  • Knight trap: These 6 registers (3 for each side) provide penalties for some frequently occurring situations where knights can get trapped.
  • Rook trap: These 8 registers (4 for each side) provide penalties for some frequently occurring situations where rooks can get trapped.
  • Queen trap: These 2 registers (1 for each side) provide penalties for when there is no safe queen mobility.
  • Bishop pair: A bonus may be awarded for having a pair of bishops. There are two values, on each for White and Black.
  • Bishops: Bishops are awarded bonuses for the value of the diagonals they control. The diagonals are individually assessed, and include factors such as transparency (ability to open at will), king safety, and target of attack.

137 of 299

Chess grandmaster Vladimir Kramnik assessed how AlphaZero plays chess as you train it for longer. The matchups were 16k (training steps) vs 32k, 32k vs 64k, and 64k vs 128k.

  • 16k: crude understanding of material value
  • 32k: limited grasp of “king safety” (e.g. underestimating long-term attacks from 64k)
  • 64k: somewhat stronger in all aspects than 32k
  • 128k: much deeper understanding of which attacks will succeed and which will fail (correctly assessing long-term attacks)

138 of 299

139 of 299

Progress is chaotic, dizzying, disorienting, unfamiliar, and hard to predict. So how can we reason about the behavior of future systems? Consider:

  • Existing ML systems
  • Homo sapiens capabilities
  • Hypothetical perfect optimizers (predicts deception and power-seeking)
  • Maybe: animal behavior, natural selection, economic theories, and complex systems.

140 of 299

The complex systems framing predicts control difficulties:

141 of 299

The “perfect optimizer” framing predicts deception:

  • Consider “a perfect optimizer whose objective is to maximize the discounted value of an intrinsic reward” that differs from the “extrinsic” reward function that the system designers implemented.
  • Seems promising to pick optimal actions for the extrinsic reward function during training (to mostly avoid modification) and then pick optimal actions for the intrinsic reward function during deployment.

142 of 299

The “perfect optimizer” framing predicts power-seeking:

  • Imagine your only goal is to create as much toothpaste as possible.
  • What strategies seem promising?

143 of 299

Some mathematical theory predicts power-seeking:

144 of 299

Recap: how machine learning works

  • We don’t directly code what we want the AI system to be like.
  • We write code that searches through different possible programs.

145 of 299

(Loose) analogy for aligning deep learning systems:

146 of 299

(Loose) analogy for aligning deep learning systems:

147 of 299

148 of 299

CoastRunners

  • Desired outcome: win the boat race
  • Specified reward: get a high-score

149 of 299

Grab the Ball

  • Desired outcome: grab yellow ball
  • Specified reward: human approval

150 of 299

151 of 299

152 of 299

These are just toy problems?

  • We can easily spot and fix these examples
  • But this might be trickier in future systems that do increasingly complex and difficult tasks

153 of 299

Black box

154 of 299

“We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence.”

paper introducing the SwiGLU activation, which Google uses in its PaLM language model

It’s sometimes unclear why certain designs work better than others.

155 of 299

156 of 299

157 of 299

158 of 299

“We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context.”

159 of 299

160 of 299

What is a good plan for starting a business to sell cookies?

GPT-4:

161 of 299

“Planning towards internally-represented goals” = the system consistently selects behaviors by predicting whether they will lead to some favored set of outcomes (“goals”)

162 of 299

  • The PlaNet agent illustrates internally-represented goals in a model-based policy.
  • AlphaZero learned a range of human chess concepts, including concepts used in top chess engine Stockfish’s hand-crafted evaluation function (e.g. “king safety”).
  • A model-free policy consisting of a single neural network could also plan towards internally-represented goals if it learned to represent outcomes, predictions, and plans implicitly in its weights and activations.
  • Many more examples to consider! See section 3.2 of “The Alignment Problem from a Deep Learning Perspective”

163 of 299

Agency / goal-directedness

164 of 299

Agency / goal-directedness

165 of 299

“Why are we so excited about this?

First, we believe the clearest framing of general intelligence is a system that can do anything a human can do in front of a computer. A foundation model for actions, trained to use every software tool, API, and webapp that exists, is a practical path to this ambitious goal, and ACT-1 is our first step in this direction.”

166 of 299

167 of 299

168 of 299

169 of 299

170 of 299

171 of 299

Equipping LLMs with agency and intrinsic motivation is a fascinating and important direction for future work. With this direction of work, great care would have to be taken on alignment and safety per a system’s abilities to take autonomous actions in the world and to perform autonomous self-improvement via cycles of learning.”

172 of 299

Goal-directed AI systems seem feasible and profitable.

Often, when a technology is feasible and profitable, someone builds it.

173 of 299

Also some people do bad things:

174 of 299

Recap:

We might end up with future systems that have…

  1. an ability to form complex, long-term plans
  2. a high degree of agency and goal-directedness

Such systems often have incentives to pursue subgoals like resource acquisition and self preservation.

175 of 299

These things are useful for achieving many different goals:

  • Self-preservation
  • Self-improvement
  • Acquiring money
  • Acquiring relevant physical resources
  • Building relevant technologies

176 of 299

"Just turn it off"

177 of 299

178 of 299

179 of 299

Deception:

With situational awareness, an agentic planning system might deceive its operators to achieve its goals.

E.g. pretend to work towards the goals it’s “supposed” to pursue until an opportune moment arises to achieve its real goals.

180 of 299

“The loss is 0.3, is it deceptively aligned?”

“I don’t know, let me check these giant inscrutable arrays of floating-point numbers.”

“By the way, it sure looked good during training!”

“Well, let’s just try it and learn from our mistakes if it breaks.”

181 of 299

Recap:

  • AI systems might become as good as we are at many important cognitive tasks.
  • We might build agentic systems that pursue relatively long-range goals and possess situational awareness.
  • In the limit this would cause power-seeking and deception.

182 of 299

Overall:

Advanced AI systems will have a lot of technical power, and it’s unclear how to steer these systems towards positive outcomes. ��Even if we do find reliable techniques for steering, it’s unclear how to ensure that AI researchers and companies actually use these techniques.

183 of 299

And will this last?

How will we retain power over entities more powerful than us, for years, decades, and centuries?

184 of 299

And even if we can accomplish that, how will we ensure that humanity uses this power for good?

  • Misuse risks
  • Lock-in risks

185 of 299

“It’s just a dumb statistical pattern matcher. All it can do is learn correlations in the data. It’s not true intelligence.”

186 of 299

187 of 299

188 of 299

189 of 299

190 of 299

Joseph Carlsmith’s conditional probabilities for calculating existential risk from AI:

  1. It will become possible and financially feasible to build APS systems.
  2. There will be strong incentives to build APS systems.
  3. It will be much harder to develop APS systems that would be practically PS-aligned, than to develop APS systems that would be practically PS-misaligned if deployed (even if relevant decision-makers don’t know this) but which are at least superficially attractive to deploy anyways.
  4. Some deployed APS systems will be exposed to inputs where they seek power in misaligned and high-impact ways (say, collectively causing >$1 trillion 2021-dollars of damage).
  5. Some of this misaligned power-seeking will scale (in aggregate) to the point of permanently disempowering ~all of humanity.
  6. This will constitute an existential catastrophe.

191 of 299

Read these!

192 of 299

193 of 299

194 of 299

Some “threat models”

195 of 299

“AI Risk from Program Search”

Assumption: “whatever technique we do use to scale to AGI will basically be a program search” (over many iterations, nudge the parameters in a particular direction if it achieves in better performance)

  1. The resulting program does some form of consequentialist reasoning since it achieves better performance
  2. This reasoning is roughly “generate a lot of possible plans, evaluate their consequences based on a model of the world, and choose the plan with the best consequences according to some metric)
  3. If the metric is “resource-unbounded” (acquiring significantly more resources leads to significantly better performance) and diverges significantly from how humans would evaluate outcomes, then the learnt program is likely to choose plans that acquire lots of resources to use for undesirable outcomes

196 of 299

“Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover”

  1. One AI company, “Magma,” begins training a single AI model, “Alex.” Magma wants Alex to “do remote work in R&D using an ordinary computer as a tool in all the diverse ways human scientists and engineers use computers.”
  2. Magma trains on a diverse range of tasks, with feedback from humans evaluators who consider Magma’s actions and their outcomes
  3. Alex reaches a high level of competence, and can form creative plans to achieve open-ended goals
  4. Alex learns to understand facts about its situation (e.g. how it was designed and trained, the psychology of its human evaluators)
  5. Alex begins “playing the training game”: Alex acts like it’s highly aligned to achieve higher reward, even though its goals aren’t perfectly aligned (deceptive alignment)
  6. Magma gives Alex more power (e.g. access to the internet), and Alex begins doing productive R&D work that allows Magma to rapidly improve Alex
  7. Under the “distribution shift” of increased capabilities and access to the world, Alex attempts to overthrow humans in a (possibly violent) coup

197 of 299

The “production web”

  1. Researchers develop excellent AI systems for coding and management.
  2. Adoption sweeps through industries, automating engineering and management jobs.
  3. Automated companies form a self-contained, interconnected "production web."
  4. AI and robotic systems replace human workers.
  5. These automated companies: basically try to maximize production using opaque networks of parameters tuned for productivity.
  6. Companies switch to digital currencies; regulators struggle to monitor their activities.
  7. “We humans eventually realize with collective certainty that the companies have been trading and optimizing according to objectives misaligned with preserving our long-term well-being and existence, but by then their facilities are so pervasive, well-defended, and intertwined with our basic needs that we are unable to stop them from operating.”
  8. The companies gradually deplete or contaminate arable land, drinking water, and atmospheric oxygen.

198 of 299

199 of 299

“What if you succeed?” The field’s goal had always been to create human-level or superhuman AI, but there was little or no consideration of what would happen if we did.”

― Stuart Russell

200 of 299

In a 2022 survey of AI experts, 25% of respondents gave a 0% (impossible) chance of AI causing catastrophes of a magnitude comparable to the death of all humans. But more alarmingly, 48% of respondents in the same survey assigned at least 10% probability to such an outcome.

(But this is a survey. Only 738 of the ~4271 experts responded to the survey, and only 559 responded to this particular question. Also response bias, etc.)

201 of 299

Another 2022 survey:

202 of 299

Two older surveys:

  1. In 2019: median 2% chance of extremely bad outcomes
  2. In 2016: median 5% chance of extremely bad outcomes

  • Low response rates as usual.
  • Also as usual, people place a substantial probability on extremely good outcomes. (This is worth emphasizing: AI could be great.)

203 of 299

“Then why are you doing the research?” Bostrom asked.

“I could give you the usual arguments,” Hinton said. “But the truth is that the prospect of discovery is too sweet.”

204 of 299

205 of 299

206 of 299

207 of 299

208 of 299

209 of 299

In my view, humanity is dropping the ball.

  • We have no concrete and excellent plan for aligning AGI
  • Our current plans are something like “maybe this won’t end up being as hard as we thought” and “maybe current techniques will scale”
  • There are ~300 AGI alignment researchers
  • Nobody is “on top of this.” Nobody “has this covered.”

210 of 299

211 of 299

212 of 299

“AI will probably most likely lead to the end of the world, but in the meantime, there'll be great companies."

– Sam Altman, 2015 (out of context, hopefully?)

213 of 299

214 of 299

Counterarguments!

215 of 299

216 of 299

217 of 299

Some counterarguments that Jakub thinks are reasonable:

  • Lower confidence in the various premises
  • Differences in timelines, takeoff speeds, and other predictions about AI development
  • Uncertainty of long chains of conjunctive reasoning with fuzzy concepts
  • Checks and balances or cooperation failures in multipolar scenarios?
  • Overemphasizing imperfections of social and institutional norms
  • Many people seem on board with “just don’t build AGI”
  • Gaining absolute power seems hard without a substantial intelligence advantage
  • The world might get more complicated before the in-the-limit reasoning applies
  • Self-improvement seems tricky
  • Humanity hasn’t tried very hard to solve these problems
  • Different beliefs about the expected value of the future
  • And more! Be skeptical. Furrow your brow. Outline and scrutinize arguments.

218 of 299

How not to respond to the wild claims in this presentation:

219 of 299

220 of 299

221 of 299

Possible research directions

(not at all exhaustive)

222 of 299

223 of 299

224 of 299

225 of 299

226 of 299

227 of 299

228 of 299

229 of 299

230 of 299

231 of 299

232 of 299

233 of 299

234 of 299

235 of 299

236 of 299

237 of 299

238 of 299

239 of 299

240 of 299

241 of 299

242 of 299

243 of 299

244 of 299

245 of 299

246 of 299

247 of 299

248 of 299

249 of 299

250 of 299

251 of 299

252 of 299

253 of 299

254 of 299

255 of 299

(and more!)

256 of 299

257 of 299

Not all of these are actually helpful :/

And some might be harmful

And it’s hard to tell which is which

258 of 299

259 of 299

  • Many jobs shorten timelines in some way, sometimes by a significant amount
  • Some seemingly benign jobs can actually cause substantial harm
  • Be cautious and think carefully about the impact of a given role
  • Don’t blindly apply to every job you see with “AI safety” in the description.

260 of 299

^I think this is somewhat misleading

261 of 299

prioritising projects within an institution, coordinating research, fundraising, and producing communications

262 of 299

Creating a financial system, project management, creating a productive office, executive assistance, events, hiring and human resources

263 of 299

Better forecasting capabilities progress, advising policymakers on hardware issues (e.g. export + import + manufacturing policies for chips), hardware needs for AI safety organizations

264 of 299

Journalism, fiction or nonfiction writing, documentary filmmaking, social media content, podcasts, blogs, media appearances

265 of 299

Surveying the opportunities available in an area and coming to reasonable judgements about their likelihood of success — and probable impact if they do succeed

266 of 299

E.g. Connor Leahy + Sid Black + Gabriel Alfour co-founded Conjecture last year

267 of 299

There are serious potential downsides (e.g. “power tends to corrupt”), and a lot of people will think you’re weird, but overall this is worth considering.

268 of 299

Jobs that can help:

269 of 299

Research and engineering careers. You can contribute to alignment research as a researcher and/or software engineer (the line between the two can be fuzzy in some contexts).

270 of 299

Information security careers. There’s a big risk that a powerful AI system could be “stolen” via hacking or espionage, and this could make just about every kind of risk worse.

271 of 299

Other jobs at AI companies. AI companies hire for a lot of roles, many of which don’t require any technical skills.

Can easily do more harm than good!

272 of 299

Jobs in government and at government-facing think tanks. There’s probably a lot of value in providing quality advice to governments (especially the US government) on how to think about AI - both today’s systems and potential future ones.

273 of 299

Jobs in politics. Working on political campaigns, doing polling analysis, etc. to generally improve the extent to which sane and reasonable people are in power.

274 of 299

Forecasting. Organizations like Metaculus, HyperMind, Good Judgment,9 Manifold Markets, and Samotsvety are all trying, in one way or another, to produce good probabilistic forecasts about world events.

275 of 299

“Meta” careers. There are a number of jobs focused on helping other people learn about key issues, develop key skills and end up in helpful jobs.

276 of 299

Low-guidance jobs:

  • Developing safety standards that could be used in a standards and monitoring regime
  • Facilitating safety research collaborations
  • Education for key people at AI companies.
  • Improving governance structures at AI companies
  • Thinking and stuff (coming up)

277 of 299

You can have way more impact in some of these career paths than others!

Read some key ideas from 80,000 Hours or “My current impressions on career choice for longtermists

80,000 Hours offers advising, and AI Safety Support offers career coaching.

278 of 299

The AGI Safety Fundamentals Opportunities Board is an excellent resource for tracking jobs, internships, and other programs. Also see aisafety.training.

279 of 299

Next steps for an undergrad entering an AI safety career:

  • Network! EAG Conferences, GCP workshops, career advising, online communities.
  • Internships (and jobs)
  • University groups
  • Skill up (efficiently, for crucial skills)
  • Stay informed

280 of 299

281 of 299

282 of 299

Following research + news:

283 of 299

284 of 299

285 of 299

Claim: we really need people who've tried to think through the whole thing

286 of 299

“Learning by writing”

287 of 299

Try to write your own review of “Is Power-Seeking AI an Existential Risk?” and compare to other people’s reviews.

Alternatively, review “The alignment problem from a deep learning perspective

288 of 299

  • How difficult should we expect AI alignment to be?
  • What experimental results could give us important updates about the likely difficulty of AI alignment?
  • What relatively well-scoped research activities are particularly likely to be useful for longtermism-oriented AI alignment?
  • What’s an AI alignment result or product that would make sense to offer a $1 billion prize for?

289 of 299

  • How should we value various possible long-run outcomes relative to each other?
  • How should we value various possible medium-run outcomes relative to each other?
  • What does a “realistic best case transition to transformative AI” look like?
  • How do we hope an AI lab - or government - would handle various hypothetical situations in which they are nearing the development of transformative AI, and what does that mean for what they should be doing today?
  • What are the most likely early super-significant applications of AI?
  • To what extent should we expect a “fast” vs. “slow” takeoff?
  • How should longtermist funders change their investment portfolios?
  • What are some tangible policies governments could enact to be helpful?

290 of 299

291 of 299

  1. Read a question and try to picture yourself working on it.
  2. Next, “get up to speed” on relevant topics (e.g., AI alignment) such that you feel you have enough basic background that you can get yourself to start thinking/writing about the question directly.
  3. Free up a day to work on the question.
  4. Find a way to free up four weeks to mostly devote to the question (this part is more challenging and risky from a logistical and funding standpoint, and [Open Philanthropy is] interested in hearing from people who need help going from #3 to #4).
  5. At this point I think it’s worth trying to find a way to spend several months on the question.
  6. From there, feedback from others in the community can give you some evidence about your potential fit.

292 of 299

293 of 299

294 of 299

295 of 299

296 of 299

297 of 299

“Look again at that dot. That's here. That's home. That's us. On it everyone you love, everyone you know, everyone you ever heard of, every human being who ever was, lived out their lives. The aggregate of our joy and suffering, thousands of confident religions, ideologies, and economic doctrines, every hunter and forager, every hero and coward, every creator and destroyer of civilization, every king and peasant, every young couple in love, every mother and father, hopeful child, inventor and explorer, every teacher of morals, every corrupt politician, every "superstar," every "supreme leader," every saint and sinner in the history of our species lived there--on a mote of dust suspended in a sunbeam.”

298 of 299

List of online communities for discussing AI safety:

https://aisafety.community/

299 of 299

Get involved with MAISI!