1 of 43

AI Safety

What risk does advanced AI pose, and what can we do?

Aditya Prasad, PhD CDS IISc

2 of 43

Who am I?

3 of 43

4 of 43

Agenda for the talk

EA and a Intro to Longtermism a school of thought

AI Safety - Alignment problem

5 of 43

Effective Altruism

Do your solutions scale? Is the problem important?/

�Unfairly neglected? Tractable?

Effectiveness = Doing the most good with the resources we have.

6 of 43

Inadequacy

7 of 43

Scope Insensitivity

Malaria is one of the leading causes of child �mortality; it kills about half a million children every year. ��

To combat this cognitive bias we can think of a single kid, Arun. He loves spending time in the library, and is really good at video games. ��His brother is dying of Malaria. 55 such kids are dying every hour of every day.

8 of 43

Broken Care-o-meters

Most of us implicitly trust our internal care-o-meters. It does not handle scale well.��It is time to rethink that, with more power comes more responsibility.�

9 of 43

Future Humans

What do we owe our future generations? ��We can act as trustees help create a flourishing world for generations to come

10 of 43

11 of 43

12 of 43

13 of 43

14 of 43

Longtermism

The question now becomes “How can I best make the very long-term future go well?” �

These arguments and their implications are studied as part of an emerging school of thought called longtermism.

it’s so difficult to predict the very long-term effects of our actions that although these effects might be very important, we don’t know what they are.

there are some actions that potentially have very long-term positive effects. For example, we can take steps to make it less likely that civilisation ends through a disaster like nuclear war, which would irreversibly deprive future generations of the chance to flourish. AGI risk, bio risk are all such risks we should work on seriously.��Maybe discuss possible counterarguments….��A 3% discount rate would imply that the suffering of one person today was equal to the suffering of 16 trillion people in 1,000 years��Suppose you have the option to bring into existence someone who would not otherwise have existed, whose life involves severe and constant suffering from birth until death, and who wished they had never been born."

--> naive person-affecting view says it's not good or bad - a “person-affecting” view of ethics, which is sometimes summed up as the view that “Ethics is about helping make people happy, not making happy people.”�

Repugnant conclusion - limitations of utilitarianism - treating individuals as mere vessels/receptacles for value

15 of 43

AI Safety

How to ensure that AI is deployed in ways that do not harm humanity.��We want to align the AI’s goals with our goals.

16 of 43

17 of 43

Capability Progress

Text to images and �even short videos

18 of 43

19 of 43

20 of 43

21 of 43

What is AGI?

Perform a diverse enough set of tasks humans can do that it is dangerous.��It can perform meta learning and transfer learning across an enormous variety of cognitive domains. Optimize across multiple domains without preprogrammed instincts. ��

It has the capability to invent tools, submodels. It has a model of the world and can use compute to search intelligently for an action. Agentic not a tool.

22 of 43

Is it possible?

We have one existence proof - humans. If the brain follows the laws of physics, then the substrate should not matter.��We can at the very least reach AGI via whole brain simulation.

23 of 43

Timelines

The 2022 expert survey found the aggregate forecast time to a 50% chance of HLMI was 37 years, i.e. 2059.��In Metaculus the median is 2028 and upper 75% is 2036.

Considering how hard the alignment problem is, how it is crucial that we solve it on the first try (unless we solve corrigibility) and how in any race to create AGI, making it safe will add additional time and cost overheads to the project.��All this means is that 6 years, 37 years or even 50 years is not a lot of time. This issue is urgent and will affect all of us alive today.

Do you think The algorithms at the root of general intelligence are so complex and indecipherable that human beings will not be able to program any such thing for many centuries? The genus Homo diverged from other genera only 2.8 million years ago, and the intervening time — a blink in the eye of natural selection — was sufficient for generating the cognitive advantages seen in humans.

Metaculus has a good track record with a Brier score of 0.103 on average - https://www.metaculus.com/questions/track-record/

Sep 2020 Ajeya Cotra gave a median of 2050 for transformative AI which is now 2040 as of 3rd Aug 2022. Her bio anchors report was pretty good.

So AI alignment experts have much shorter timelines -

https://docs.google.com/spreadsheets/d/1OGiDFCgRjfxOFVaYPl96dYgx8NVIaSquDgQSKByDxuc/edit#gid=0

https://aiimpacts.org/2022-expert-survey-on-progress-in-ai/#Summary_of_results

24 of 43

25 of 43

26 of 43

Scaling Hypothesis

27 of 43

Orthogonality Thesis

28 of 43

Instrumental Convergence

29 of 43

AI Alignment

Early detection of deception and power seeking behaviour

�Interpretability research - Explainability��Agent Foundations - deconfusion of pre-paradigmatic field��Tool alignment of existing LLMs, model exploration, red teaming

30 of 43

Core Problems

Corrigibility - the shut down problem�Ontology mismatch - is the NAH true?�Inner alignment - mesa optimizers can be misaligned��True names - resistant to goodharting�Sharp left turn - will alignment generalize like capabilities?�Gradient hacking - like how humans reward themselves�Oracle AI - is agentic behaviour inevitable after optimization?

31 of 43

Corrigibility

theturingprize.com

32 of 43

Specification gaming

33 of 43

34 of 43

Fire Alarms

What is the least impressive feat that you would bet big money at 9-1 odds cannot possibly be done in 2 years?��

35 of 43

36 of 43

The most important century!

37 of 43

Summary

There are incentives to develop AGI,�The timelines look short,

Alignment is a hard technical problem,

��Even if you give this a small probability of happening, the enormous impact this has means we should not be neglecting this area as we are right now.

38 of 43

What can you do?

Think for yourself, challenge these arguments, read more and become aware, talk to friends. Discuss!��Spend a month on upskilling and learning�

Approach 80,000 hours for counselling

39 of 43

40 of 43

41 of 43

Questions?

42 of 43

Feedback and Contact me!

Please fill this feedback form - https://bit.ly/3Vahv1q

At the end of the form you will find instructions �to get a free book delivered to your address!

Calendly - https://calendly.com/adityaarpitha - Book a 1 on 1 with me to discuss more.

WhatsApp - +91 9566762034

Instagram - instagram.com/adityaarpitha/

Twitter - https://twitter.com/AdityaPrasadO

43 of 43

https://www.effectivealtruism.org/