Email: Yanmin.Sun@forces.gc.ca
Defence Research Development Canada – Centre for Operational Research and Analysis
Defense Against Adversarial Attacks
to Large Language Models
– A Technology Survey
Dr. Yanmin Sun
Centre of Operational Research and Analysis
Outline�
https://www.synopsys.com/glossary/what-is-devsecops.html#:~:text=Definition,in%20the%20software%20delivery%20cycle.
2
Introduction
3
Adversarial Attacks on LLMs
4
Adversary Attacks on LLMs
Adapting LLMs to downstream tasks
5
Adversary Attacks on LLMs
Attacks
6
LLM Adaption
7
Adversarial Attacks on LLMs�
Adversary Attack Approaches
8
Data Attacks
Text perturbation –introducing variations and noise in data
9
Data Attacks
Backdoor Attack – Injecting a pre-designed trigger to the training data such that the victim model produces adversarial outputs when the trigger presented in input
10
Prompt Attacks
Automatic Prompt Attack
11
Prompt Attacks
Instruction Tuning Attacks – Inserting adversarial prompts into an instruction set for fine-tuning pre-trained LLMs
12
Defense Solutions
Defense against Adversarial Attacks
13
Defense Solutions
Defense Against Adversarial Data
14
Defense Solutions
Defense Against Adversarial Prompts
- tries to re-design a given instruction prompt to accomplish the target task and thwart the attack
- aims to detect whether a given prompt is attacked or not.
15
Defense Solutions
Defense Against Adversarial Prompts
16
Defense Solutions
17
Defense Solutions
Defense Against Adversarial Prompts
18
Conclusion Remarks��
19