1 of 43

Potential Existential Risks from Artificial Intelligence

Ben Garfinkel�Future of Humanity Institute (University of Oxford)

2 of 43

Plan for the talk

  1. Background on artificial intelligence�
  2. Analysis of future possibilities�
  3. Outline of potential risks�
  4. Takeaways

3 of 43

Background

4 of 43

What is AI?

  • AI system: An engineered system that can perform impressive cognitive tasks �
    • Example tasks: Playing games, writing articles, recognizing faces, navigating vehicles�
  • Exactly what counts as “AI” is a bit arbitrary and depends on who you ask

5 of 43

History of the field

  • Founded in 1956 with aim of recreating “every aspect of learning or any other feature of intelligence” in machines�
  • In the past decade researchers in the field have achieved several major breakthroughs. Resources and attention have grown accordingly. �
  • Now exist a wide range of highly or moderately specialized systems capable of performing particular tasks

6 of 43

Some present-day capabilities

7 of 43

Some present-day capabilities

8 of 43

Some present-day capabilities

9 of 43

Some present-day capabilities

10 of 43

Some present-day capabilities

11 of 43

Future developments

12 of 43

Where is the field headed?

  • Focus of this talk will not be on present-day systems�
  • The approach is instead to ask: ��“Suppose the field of artificial intelligence achieves what most of its researchers expect it to achieve. What risks might then arise?”

13 of 43

Human-substitute AI (or “Human-level AI”)

  • Most AI researchers expect that AI systems will eventually be able to perform any cognitive task a person can�
    • Collectively, such systems might be called human-substitute AI (HSAI)�
    • Basic argument for HSAI: If blind natural selection could produce these abilities, why shouldn’t human scientists/engineers be able to (given enough time and effort)?�
  • One survey suggests the median researcher believes this will most likely happen in next half-century (Grace et al., 2017)�

14 of 43

Studying a world with HSAI

  • Some prominent economists have begun to analyze potential properties of a world with HSAI (Brynjolfsson and Mcafee, 2014; Nordhaus, 2015; Hanson, 2016; Aghion et al., 2017; Korinek and Stiglitz, 2017)�
  • Common suggestions are that growth and the pace of innovation may be radically higher and that (unsurprisingly) demand for labor may be radically lower

15 of 43

Superintelligence

  • Very likely a world with HSAI would would be one where the cognitive abilities of AI systems vastly exceed those of humans�
    • Digital systems can already dramatically outperform humans at many cognitive tasks (from chess to long division)�
    • Unlikely human brains are close to the physical limits of performance for most tasks, especially given limited processing speed
  • A world with HSAI would therefore likely be characterized by the existence of superintelligence (Bostrom, 2014). It should not be imagined as a world where human workers have just been “replaced” by similarly capable machines.

16 of 43

Major uncertainties: Specialization

  • Unclear how cognitive abilities would be distributed across AI systems�
  • Some imagine large collections of narrowly competent systems, each providing some limited service (Drexler, 2019)�
  • Others imagine generally competent, agent-like systems (Bostrom, 2014)�
  • Systems could also be spread across many degrees of generality and “agency”

17 of 43

Major uncertainties: Continuity

  • Unclear how “continuous” we should expect a transition to a world with HSAI to be�
  • Some argue it should be fairly continuous, in analogy with previous economic transitions (Hanson, 2016)�
  • Others expect the transition to look more like a sudden jump (Bostrom, 2014), perhaps due to a discrete breakthrough

18 of 43

Potential risks

19 of 43

Potential risks

  • Prominent AI researchers (e.g. Turing, Good, Minsky) have long expressed serious concerns about the potential long-run implications of “success.” As does the leading textbook (Russell and Norvig, 2015). �
  • The median AI researcher reports a 5% credence that the outcome of HSAI would be roughly as bad as human extinction (Grace, 2017). �
  • We certainly don’t know HSAI will ever arise. But it still seems plausible enough to make the potential risks worth considering

20 of 43

Diversity in existential risk concerns

  • There is no single argument for why a transition to HSAI might pose existential risks. Concerns are diverse.�
  • Will aim to run through four somewhat distinct genres of existential risk concerns:�
    • Instability
    • Bad attractor state
    • Lock-in of contingencies
    • Technical failure

21 of 43

Potential risks: Instability

22 of 43

Instability

  • Some worry that progress in AI could heighten risks associated with war, social collapse, or terrorism. �
    • These risks could either cause permanent damage (i.e. be existential risks in their own right) or decrease our ability to manage other existential risks
  • Concerns about heightened risk of great power war probably most salient from long-term perspective (Dafoe, 2018; Horowitz, 2018; Scharre, 2018)�
    • These concerns emphasize greater potential for rapid power shifts, changes in key strategic parameters, and miscalculation�
  • Much more could be said….�

23 of 43

Potential risks: Bad attractor state

24 of 43

Bad attractor state

  • Some worry that a transition to HSAI could by default result in particular long-lasting changes in institutions or value systems -- with no guarantee these changes will be desirable (Dafoe, 2018)�
  • This concern is associated with an at least somewhat technological determinist perspective

25 of 43

Technological determinism

  • A weak determinist hypothesis: The probabilities of various institutions and value systems existing in a society are heavily influenced by the technologies in use (Morris, 2015)�
  • Evidence: Observed regularities in the kinds of institutions and value systems observed within forager, farmer, and industrialized societies

26 of 43

Forager societies

Farmer societies

Industrialized societies

Political institutions

Consensus-based

Hierarchical, rarely democratic

Hierarchical, often democratic

Living standards

Low, stagnant

Low, stagnant

High, rising

Forced labor

Rare

Common

Rare

Inequality

Low/medium

High

High

Gender roles

Somewhat distinct (probably)

Highly distinct

Somewhat distinct

Interpersonal violence

High (probably)

Moderate

Low

27 of 43

Assessing previous major economic revolutions

  • Neolithic Revolution arguably brought a long-lasting and hard-to-escape net decrease in average welfare and freedom (Diamond, 1997; Scott, 2017)�
  • Industrial Revolution has arguably brought a long-lasting and continuing increase in average welfare and freedom�
    • Striking that many celebrated positive global trends (Pinker, 2018; Rosling, 2018) are specifically industrial-era trends

28 of 43

Assessing the AI Revolution: Concerns about humans

  • Should not take it for granted that the effects of an AI Revolution on humans would be on net desirable, given mixed record between previous two major economic revolutions�
  • Concerns often emphasize the potential dependence of most people on elites, in a world where human labor has little value and law enforcement is highly automated (Drexler, 1986; Joy, 2000)�
    • One specific concern is that economic factors that help to explain the unusual prevalence of democracy in the industrial era (Acemoglu and Robinson, 2005) would no longer hold. Some warn of a potential stable shift toward totalitarianism (Dafoe, 2018).�
    • Related concerns about inequality (Korinek and Stiglitz, 2017), welfare, and political violence have also been raised.

29 of 43

Assessing the AI Revolution: Concerns about AI systems

  • On some views, certain kinds of AI systems (“digital minds”) might be capable of having morally relevant experiences and might eventually operate as highly autonomous agents�
  • Suppose we accept this. Then one might then worry that competitive pressures would shape the likely experiences of digital minds in semi-deterministic ways (Bostrom, 2004; Hanson, 2015)�
    • Ease of replication might imply highly competitive, Malthusian conditions for societies of digital minds. These conditions would be associated with strong selection pressure toward maximally productive designs, which might not allow for experiences we would regard as very valuable.�
    • One specific version of this concern is gains from specialization might render the persistence of AI systems with very general cognition untenable. And very general cognition might be necessary to have morally relevant experiences.

30 of 43

Conditional technological determinism

  • A completely deterministic perspective would suggest there’s nothing to be done to ensure the transition to a world with HSAI has a good outcome (besides hope for the best)�
  • A slightly more action-guiding perspective, then, might be that certain changes in institutions and values are nearly inevitable in the absence of strong proactive global coordination�
    • On this perspective, the effects of previous major technological transitions have been deterministic only insofar as they have been guided primarily by competitive pressures.
  • This sort of coordination seems at least more plausible now than it was in previous eras. This suggests a potential precautionary motivation for working toward stronger global institutions (Bostrom, 2018). �
    • However, even in this case, there’s the question of whether the necessary levels of coordination could be sustained indefinitely.

31 of 43

Potential risks: Lock-in of contingencies

32 of 43

Lock-in concerns

  • Another, almost opposite set of concerns instead emphasize a particular kind of historical contingency
  • Certain historically contingent features of the world seem to have become “locked in” for long periods of time:�
    • Political institutions (e.g. US Constitution)
    • Values (e.g. Confucianism)
    • Technology standards (e.g. Internet protocols)
    • Power relations (e.g. post-WW2 world order)�
  • Could we expect some contingent features of the world to lock in for an especially long time around HSAI?

33 of 43

The “choosing our successors” argument

  • Some suggest that digital minds might ultimately function as “successors” to the human species (Hanson, 2016)�
  • Insofar as one accepts this viewpoint, one might imagine there will be opportunities to “lock in” the values and other traits of future generations (i.e. the designs of future digital minds) in a way that has never been possible before�
  • Even small differences in values or traits could be highly significant overall if they persist far enough into the future. Failures to make good choices (e.g. regarding the presence or character of conscious experience in AI systems, the relationship between humans and AI systems, or even just AI ‘population levels’) could constitute moral catastrophes.�
  • No obvious reason to expect morally consequential choices to made even close to as well as possible (given complexity of the relevant moral questions -- and the very questionable moral track record of powerful groups or humanity as a whole)

34 of 43

Potential risks: Technical failure

35 of 43

Technical failure

  • A final class of concerns is that engineering failures will result in AI systems that engage in unexpected behaviors that their users regard as harmful (“technical failures”)�
  • Some argue that, in the future, technical failures could be severe enough to constitute existential catastrophes

36 of 43

Two kinds of failure: Incompetence

  • Some technical failures are best attributed to the incompetence of AI systems�
  • Examples might include: �
    • A self-driving car veering off the road�
    • An image recognition systems misclassifying a civilian as a military target�
  • This category of failure is fairly straightforward and has no obviously important disanalogy with the sorts of failures seen in most other technologies (e.g. bridges falling down).�

37 of 43

Two kinds of failure: Misalignment

  • Some technical failures are best attributed to the competent pursuit of goals that diverge from those of the system’s users�
  • Consider a predictive policing algorithm with an unintended disparate impact�
    • Here the system’s only “goal” is predictive accuracy. The user’s goals are predictive accuracy and (implicitly) avoidance of disparate impact.�
  • This sort of failure is arguably more analogous to the sorts of failures we see in institutions (e.g. a profit-maximizing corporation causing social harms not desired by its shareholders) than to typical technological failures.�
  • An AI system with a tendency to fail in this way is said to be “misaligned.”

38 of 43

Explaining misalignment: one story

  • AI systems with “simple” goals are often easier to create than systems with “nuanced” goals�
  • Unfortunately, many activities humans might want to automate are associated with highly “nuanced” goals�
    • Examples: Policing, urban planning, military strategy, house cleaning�
  • Alignment failures may emerge in cases where people choose to create and deploy systems with goals that are excessively “simple,” relative to complexity of their users’ goals and the range of behaviors available to the systems

39 of 43

Existential misalignment risk

  • Some suggest that, in a world with HSAI, alignment failures might rise to the level of existential risks (Bostrom, 2014)
  • Concerns about existentially significant alignment failures have been voiced by the author of the leading textbook on artificial intelligence (Stuart Russell) and a co-founder of the leading AI research lab (Shane Legg)�

40 of 43

Diversity in alignment risk concerns

  • Researchers concerned about existentially significant misalignment often have very different visions of the risk -- or stories to explain why it is plausible/likely.�
  • Some describe a gradual and hard-to-reverse slide into a world that fails to reflect many of the nuances of human values (Christiano, 2019). There is a close link to “lock-in” concerns.�
  • Others describe vivid catastrophes caused by highly autonomous AI agents that are both broadly superintelligent and extremely misaligned (Bostrom, 2014). �
    • In their strongest versions, these accounts of the risk imagine AI systems appropriating vital resources in the pursuit of goals that overlap very little with our own. The suggested result is humans becoming enfeebled or even driven toward extinction, in a manner that evokes the impact humans have had on (e.g.) chimpanzees.

41 of 43

Conclusions

42 of 43

Summing up

  • A growing number of researchers in a wide range of disciplines have expressed concerns about potential existential risks from artificial intelligence. �
  • These concerns are extremely heterogeneous.�
  • It is also still unclear how much credence one should give each concern. Many are fairly abstract and have only recently begun to receive significant academic study.

43 of 43

Summing up

  • Overall, it does still seems fair to classify existential risks from artificial intelligence as more “speculative” than existential risks from other well-known sources (e.g. nuclear war)�
  • Nonetheless, given the potential stakes, these proposed risks seem more than worthy of serious investigation and concern�
  • Any field where a typical researcher assigns a 5% probability to an outcome (roughly) as bad as human extinction is certainly worth paying some attention to��