1 of 11

Securing Artificial Intelligence: �An Overview

CAE-AI Workshop 2

Sagar Samtani, Ph.D.

Assistant Professor and Arthur M. Weimer Faculty Fellow

Executive Director, Data Science and Artificial Intelligence Lab

Kelley School of Business, Indiana University

1

2 of 11

2

Source: TopBots, 2021

3 of 11

AI Pain Shifts Towards Risk

3

4 of 11

4

Adopting AI = Adopting AI Risk

Problem Overview

Operational

Security�& Privacy

Ethical

Building & deployment, scalability, reliability & availability, monitoring & management

System vulnerabilities,�model theft, data theft, model evasion

Fairness, ethics, explainable Al, model abuse

5 of 11

5

ML Pipeline

Problem Overview

Data curation

Model training

Model validation

Model deployment

Monitoring

10 years

ago

Today

6 of 11

Background and Challenges

6

Training

2) Dataset Assembly

Validation

Test

5) Evaluation

3) Datasets

4) Learning Algorithm

1) Raw Data from

the World

6) Inputs

7) Model

8) Inference

Algorithm

9) Outputs

Diagram Adapted from Berryville Institute of Machine Learning (BIML; https://berryvilleiml.com)

Raw Data

Dataset Assembly

Datasets

Learning Algorithm

Evaluation

Inputs

Model

Inference Algorithm

Outputs

-Confidentiality

-Trustworthiness

-Storage

-Legal

-Encoding

-Representation

-Encoding

-Annotation

-Normalize

-Partitioning

-Fusion

-Filter

-Poisoning

-Transfer

-Dissimilarity

-Storage

-Supervisor

-Online

-Reproducibility

-Exploit-v-Explore

-Randomness

-Blind Spots

-Confidentiality

-Overfitting

-Bad eval data

-Cooking the books

-Catastrophic forgetting

-Adversarial examples

-Controlled input

-Dirty input

-Looped input

-Improper re-use

-Trojan

-Representation fluidity

-Training set reveal

-Online

-Inscrutability

-Hyperparameters

-Confidence scores

-Hosting

-Direct

-Provenance

-Inscrutability

-Transparency

Selected Risks Adapted from Berryville Institute of Machine Learning (BIML; https://berryvilleiml.com)

7 of 11

7

Data and Model Assurance

Inherited Risk from AI Supply Chain

Software & Platform Stack

CVE/CWE

Causative attacker influence

Exploratory attacker influence

Data curation

Model training

Model validation

Model deployment

Monitoring

Data provider

Model provider

Poisoning

Backdoor

Pickle ACE

Inversion

Stealing

Evasion

Abuse/Misuse

8 of 11

8

ML Pipeline

Problem Overview

Data curation

Model training

Model validation

Model deployment

Monitoring

10 years

ago

Today

9 of 11

Foundation Model Security

  • Various steps of the foundation model: (1) raw data, (2) inputs, (3) model, (4) inference algorithm, and (5) outputs.
  • Top ten LLM risks:
    1. Recursive pollution: can be spectacularly wrong
    2. Data debt: datasets too big to check
    3. Improper use: faith in transfer learning
    4. Black box opacity: undocumented, unstable APIs
    5. Prompt manipulation
    6. Data poisoning
    7. Reproducibility economics
    8. Data ownership
    9. Model trustworthiness
    10. Encoding integrity

9

LLM Components (credit; BIML

https://berryvilleiml.com/docs/BIML-LLM24.pdf)

10 of 11

Existing Landscape

10

Papers on Adversarial Examples

AI Vulnerabilities Tools and Resources

AI Risk Management Frameworks and Incident Databases

Source: N. Carlini;

https://nicholas.carlini.com/writing/2019/all-adversarial-example-papers.html

Source: atlas.mitre.org

Source: incidentdatabase.ai

Source: berryvilleiml.com

Source: https://arxiv.org/pdf/2101.10865.pdf

Source: airisk.io

Firms

11 of 11

Thank you!

Questions or Comments?

_

Sagar Samtani, Ph.D.

Kelley School of Business, Indiana University

ssamtani@iu.edu

dsail.iu.edu

11