1 of 13

AI Safety - Introduction

2024

Chon, Kilnam

KAIST

2024.11.1rev2025.2.11

2 of 13

Future of Life Institute (FLI)

  • AI Safety Research Projects, 2015~
  • Asilomar Conference on Beneficial AI in the 2017~2021.
  • AI Principles, 2022.

2

3 of 13

AI Safety Meetings

2023.11 Bletchley, UK (with Bletchley Declaration)

2024.03 (Second) International Dialogue on AI Safety, Beijing

2024.05 AI Seoul (Safety) Summit (co-hosted by UK and South Korea)

2024.11 First International AI Safety Institute Meeting, San Francisco

2025.02 AI Action Summit, France

3

4 of 13

List of AI Safety Institutes

US

UK

Australia

Canada

European Commission

France

Japan

(Kenya)

Singapore

South Korea

4

5 of 13

Remarks and Issues

1. Global South (~G20) vs Global North (G7)

2. Multistakeholder vs Multilateral (vs Unilateral)

- Internet Governance with Multistakeholder

- AI Safety/Governance with Multilateral(?)

3. USA and China are two major countries on AI

5

6 of 13

AI Safety – Definition

AI safety is an interdisciplinary field focused on preventing accidents, misuse, or other harmful consequences arising from artificial intelligence (AI) systems. It encompasses machine ethics and AI alignment, which aim to ensure AI systems are moral and beneficial, as well as monitoring AI systems for risks and enhancing their reliability. The field is particularly concerned with existential risks posed by advanced AI models.

6

7 of 13

Bletchley Declaration (excerpt)��Countries agreed substantial risks from potential misuse or unintended issues of control of frontier AI, with particular concern caused by cybersecurity, biotechnology and disinformation risks. The Declaration sets out agreement that there is “potential for serious, even catastrophic, harm, either deliberate or unintentional, stemming from the most significant capabilities of AI models.” Countries also noted the risks beyond frontier AI, including bias and privacy.

7

8 of 13

References

AI Safety Institute, Wikipedia, 2024.

AI Safety Newsletter #47 Reasoning Models, 2025.2.6.�AI safety and automation bias, Downside of human-in-loop by Lauren Kahan, CSET, 2024.11.

State of AI safety in China, Concordia AI, 2023.

Bletchley Declaration, 2023.11.

ChinAI Newsletter, 2010s~�China AI Safety and Development Association, ChinAI Newsletter #299, 2025.2.12.�CSIS, Chinese assessment of AI safety, risks and approach to mitigation, 2024.�FLI AI Safety Index, 2024.�D. Hassabis, Accelerating science discovery, 2024.11.�D. Hendrycks, Introduction to AI Safety, Ethics, and Society, 2024.�D. Jjanku, et al., We have no science of safe AI, IICFG, 2024.

(Second) International Dialogue on AI Safety, Beijing, 2024.3.10-11.

AI Safety Summit Talks with Yoshua Bengio, YouTube, 2024.5. �Int’l AI Summit Report, AI Action Summit, 2025.1

Int’l Scientific Report on Safety of Advanced AI, AI Seoul (Safety) Summit, 2024.5.�

8

9 of 13

References

Stuart Russell, What if we succeed?, 2024. �Stuart Russell, General AI safety, DDS&T, Lawrence Livermore, 2024.11.

USG announced global cooperation plan among AI safety organizations, 2024.5.

US Vision of AI Safety, Elizabeth Kelly, Director of AI Safety Institute, 2024.

USG, Framework to advance AI gov. & risk management in national security, 2024.�C. Wilson, US can win without compromising AI safety, TechPolicy.press, 2024.11.

Yi Zeng, Keynote Speech, Closed-Door Meeting on Trustworthy AI, Wuzhen, 2024.�Zhi Zhong, Should we shut down AI?, (AI & End of Humanity), YouTube, 2024.

9

10 of 13

Appendix: AI Security is an Emerging Field

  • Data Science
  • Application Security
  • Network Security
  • Data Security
  • Risk Management/Auditing

Source: A. Nikolich, 2024 NSF Cybersecurity Summit

10

11 of 13

Appendix: AI Attacks

  • A type of security attack
  • Vulnerability
  • Threat model
  • AI Poisoning attach
  • (more))

[source: Nicholich, 2024 NSF Cybersecurity Summit: AI Security for Science, 2024.

11

12 of 13

Appendix: Stuart Russell

Make Safe AI vs Make AI Safe

12

13 of 13

Appendix: AI Safety Tests

C. Wilson, US can win without compromising AI safety, 2024.11.

13