AI Safety
What risk does advanced AI pose, and what can we do?
Aditya Prasad, PhD CDS IISc
Who am I?
Agenda for the talk
Effective Altruism
Do your solutions scale? Is the problem important?/
�Unfairly neglected? Tractable?
Effectiveness = Doing the most good with the resources we have.
Inadequacy
Scope Insensitivity
Malaria is one of the leading causes of child �mortality; it kills about half a million children every year. ��
To combat this cognitive bias we can think of a single kid, Arun. He loves spending time in the library, and is really good at video games. ��His brother is dying of Malaria. 55 such kids are dying every hour of every day.
Broken Care-o-meters
Most of us implicitly trust our internal care-o-meters. It does not handle scale well.���It is time to rethink that, with more power comes more responsibility.�
Future Humans
What do we owe our future generations? ���We can act as trustees help create a flourishing world for generations to come
Longtermism
The question now becomes “How can I best make the very long-term future go well?” �
These arguments and their implications are studied as part of an emerging school of thought called longtermism.
AI Safety
How to ensure that AI is deployed in ways that do not harm humanity.���We want to align the AI’s goals with our goals.
Capability Progress
Text to images and �even short videos
What is AGI?
Perform a diverse enough set of tasks humans can do that it is dangerous.��It can perform meta learning and transfer learning across an enormous variety of cognitive domains. Optimize across multiple domains without preprogrammed instincts. ��
It has the capability to invent tools, submodels. It has a model of the world and can use compute to search intelligently for an action. Agentic not a tool.
Is it possible?
We have one existence proof - humans. If the brain follows the laws of physics, then the substrate should not matter.���We can at the very least reach AGI via whole brain simulation.
Timelines
The 2022 expert survey found the aggregate forecast time to a 50% chance of HLMI was 37 years, i.e. 2059.���In Metaculus the median is 2028 and upper 75% is 2036.
Scaling Hypothesis
Orthogonality Thesis
Instrumental Convergence
AI Alignment
Early detection of deception and power seeking behaviour
�Interpretability research - Explainability��Agent Foundations - deconfusion of pre-paradigmatic field��Tool alignment of existing LLMs, model exploration, red teaming
Core Problems
Corrigibility - the shut down problem�Ontology mismatch - is the NAH true?�Inner alignment - mesa optimizers can be misaligned��True names - resistant to goodharting�Sharp left turn - will alignment generalize like capabilities?�Gradient hacking - like how humans reward themselves�Oracle AI - is agentic behaviour inevitable after optimization?
Corrigibility
theturingprize.com
Specification gaming
Fire Alarms
What is the least impressive feat that you would bet big money at 9-1 odds cannot possibly be done in 2 years?���
The most important century!
Summary
There are incentives to develop AGI,�The timelines look short,
Alignment is a hard technical problem,
��Even if you give this a small probability of happening, the enormous impact this has means we should not be neglecting this area as we are right now.
What can you do?
Think for yourself, challenge these arguments, read more and become aware, talk to friends. Discuss!���Spend a month on upskilling and learning�
Approach 80,000 hours for counselling
Questions?
Feedback and Contact me!
Please fill this feedback form - https://bit.ly/3Vahv1q
At the end of the form you will find instructions �to get a free book delivered to your address!
Calendly - https://calendly.com/adityaarpitha - Book a 1 on 1 with me to discuss more.
WhatsApp - +91 9566762034
Instagram - instagram.com/adityaarpitha/
Twitter - https://twitter.com/AdityaPrasadO