AI Safety - Introduction
2024
Chon, Kilnam
KAIST
2024.11.1rev2025.2.11
Future of Life Institute (FLI)
2
AI Safety Meetings
2023.11 Bletchley, UK (with Bletchley Declaration)
2024.03 (Second) International Dialogue on AI Safety, Beijing
2024.05 AI Seoul (Safety) Summit (co-hosted by UK and South Korea)
2024.11 First International AI Safety Institute Meeting, San Francisco
2025.02 AI Action Summit, France
3
List of AI Safety Institutes
US
UK
Australia
Canada
European Commission
France
Japan
(Kenya)
Singapore
South Korea
4
Remarks and Issues
1. Global South (~G20) vs Global North (G7)
2. Multistakeholder vs Multilateral (vs Unilateral)
- Internet Governance with Multistakeholder
- AI Safety/Governance with Multilateral(?)
3. USA and China are two major countries on AI
5
AI Safety – Definition
AI safety is an interdisciplinary field focused on preventing accidents, misuse, or other harmful consequences arising from artificial intelligence (AI) systems. It encompasses machine ethics and AI alignment, which aim to ensure AI systems are moral and beneficial, as well as monitoring AI systems for risks and enhancing their reliability. The field is particularly concerned with existential risks posed by advanced AI models.
6
Bletchley Declaration (excerpt)��Countries agreed substantial risks from potential misuse or unintended issues of control of frontier AI, with particular concern caused by cybersecurity, biotechnology and disinformation risks. The Declaration sets out agreement that there is “potential for serious, even catastrophic, harm, either deliberate or unintentional, stemming from the most significant capabilities of AI models.” Countries also noted the risks beyond frontier AI, including bias and privacy.
7
References
AI Safety Institute, Wikipedia, 2024.
AI Safety Newsletter #47 Reasoning Models, 2025.2.6.�AI safety and automation bias, Downside of human-in-loop by Lauren Kahan, CSET, 2024.11.
State of AI safety in China, Concordia AI, 2023.
Bletchley Declaration, 2023.11.
ChinAI Newsletter, 2010s~�China AI Safety and Development Association, ChinAI Newsletter #299, 2025.2.12.�CSIS, Chinese assessment of AI safety, risks and approach to mitigation, 2024.�FLI AI Safety Index, 2024.�D. Hassabis, Accelerating science discovery, 2024.11.�D. Hendrycks, Introduction to AI Safety, Ethics, and Society, 2024.�D. Jjanku, et al., We have no science of safe AI, IICFG, 2024.
(Second) International Dialogue on AI Safety, Beijing, 2024.3.10-11.
AI Safety Summit Talks with Yoshua Bengio, YouTube, 2024.5. �Int’l AI Summit Report, AI Action Summit, 2025.1
Int’l Scientific Report on Safety of Advanced AI, AI Seoul (Safety) Summit, 2024.5.�
8
References
Stuart Russell, What if we succeed?, 2024. �Stuart Russell, General AI safety, DDS&T, Lawrence Livermore, 2024.11.
USG announced global cooperation plan among AI safety organizations, 2024.5.
US Vision of AI Safety, Elizabeth Kelly, Director of AI Safety Institute, 2024.
USG, Framework to advance AI gov. & risk management in national security, 2024.�C. Wilson, US can win without compromising AI safety, TechPolicy.press, 2024.11.
Yi Zeng, Keynote Speech, Closed-Door Meeting on Trustworthy AI, Wuzhen, 2024.�Zhi Zhong, Should we shut down AI?, (AI & End of Humanity), YouTube, 2024.
9
Appendix: AI Security is an Emerging Field
Source: A. Nikolich, 2024 NSF Cybersecurity Summit
10
Appendix: AI Attacks
[source: Nicholich, 2024 NSF Cybersecurity Summit: AI Security for Science, 2024.
11
Appendix: Stuart Russell
Make Safe AI vs Make AI Safe
12
Appendix: AI Safety Tests
C. Wilson, US can win without compromising AI safety, 2024.11.
13