1 of 43

AI Safety

What risk does advanced AI pose, and what can we do?

Aditya Prasad, PhD CDS IISc

2 of 43

Who am I?

3 of 43

4 of 43

Agenda for the talk

  1. EA and a Intro to Longtermism a school of thought

  • AI Safety - Alignment problem

5 of 43

Effective Altruism

Do your solutions scale? Is the problem important?/

�Unfairly neglected? Tractable?

Effectiveness = Doing the most good with the resources we have.

6 of 43

Inadequacy

7 of 43

Scope Insensitivity

Malaria is one of the leading causes of child �mortality; it kills about half a million children every year. ��

To combat this cognitive bias we can think of a single kid, Arun. He loves spending time in the library, and is really good at video games. ��His brother is dying of Malaria. 55 such kids are dying every hour of every day.

8 of 43

Broken Care-o-meters

Most of us implicitly trust our internal care-o-meters. It does not handle scale well.���It is time to rethink that, with more power comes more responsibility.�

9 of 43

Future Humans

What do we owe our future generations? ���We can act as trustees help create a flourishing world for generations to come

10 of 43

11 of 43

12 of 43

13 of 43

14 of 43

Longtermism

The question now becomes “How can I best make the very long-term future go well?” �

These arguments and their implications are studied as part of an emerging school of thought called longtermism.

15 of 43

AI Safety

How to ensure that AI is deployed in ways that do not harm humanity.���We want to align the AI’s goals with our goals.

16 of 43

17 of 43

Capability Progress

Text to images and �even short videos

18 of 43

19 of 43

20 of 43

21 of 43

What is AGI?

Perform a diverse enough set of tasks humans can do that it is dangerous.��It can perform meta learning and transfer learning across an enormous variety of cognitive domains. Optimize across multiple domains without preprogrammed instincts. ��

It has the capability to invent tools, submodels. It has a model of the world and can use compute to search intelligently for an action. Agentic not a tool.

22 of 43

Is it possible?

We have one existence proof - humans. If the brain follows the laws of physics, then the substrate should not matter.���We can at the very least reach AGI via whole brain simulation.

23 of 43

Timelines

The 2022 expert survey found the aggregate forecast time to a 50% chance of HLMI was 37 years, i.e. 2059.���In Metaculus the median is 2028 and upper 75% is 2036.

24 of 43

25 of 43

26 of 43

Scaling Hypothesis

27 of 43

Orthogonality Thesis

28 of 43

Instrumental Convergence

29 of 43

AI Alignment

Early detection of deception and power seeking behaviour

�Interpretability research - Explainability��Agent Foundations - deconfusion of pre-paradigmatic field��Tool alignment of existing LLMs, model exploration, red teaming

30 of 43

Core Problems

Corrigibility - the shut down problem�Ontology mismatch - is the NAH true?�Inner alignment - mesa optimizers can be misaligned��True names - resistant to goodharting�Sharp left turn - will alignment generalize like capabilities?�Gradient hacking - like how humans reward themselves�Oracle AI - is agentic behaviour inevitable after optimization?

31 of 43

Corrigibility

theturingprize.com

32 of 43

Specification gaming

33 of 43

34 of 43

Fire Alarms

What is the least impressive feat that you would bet big money at 9-1 odds cannot possibly be done in 2 years?���

35 of 43

36 of 43

The most important century!

37 of 43

Summary

There are incentives to develop AGI,�The timelines look short,

Alignment is a hard technical problem,

��Even if you give this a small probability of happening, the enormous impact this has means we should not be neglecting this area as we are right now.

38 of 43

What can you do?

Think for yourself, challenge these arguments, read more and become aware, talk to friends. Discuss!���Spend a month on upskilling and learning�

Approach 80,000 hours for counselling

39 of 43

40 of 43

41 of 43

Questions?

42 of 43

Feedback and Contact me!

Please fill this feedback form - https://bit.ly/3Vahv1q

At the end of the form you will find instructions �to get a free book delivered to your address!

Calendly - https://calendly.com/adityaarpitha - Book a 1 on 1 with me to discuss more.

WhatsApp - +91 9566762034

Instagram - instagram.com/adityaarpitha/

Twitter - https://twitter.com/AdityaPrasadO

43 of 43