1
Part II
Backdoor Attacks
2
Pre-training stage
Training stage
backward
forward
Post-training stage
Inference stage
Poisioned data generation
Backdoor injection
Backdoor activation
Backdoored model
backward
forward
Standard training procedure
Backdoor attack procedure
3
Poisioned data generation
Backdoor injection
Backdoor activation
Backdoored model
Data-poisoning based Backdoor Attack
Training-controllable based Backdoor Attack
backward
forward
4
Data-poisoning based Backdoor Attack
5
Poisioned data generation
Backdoor injection
backward
forward
Training-controllable based Backdoor Attack
Poisioned data generation
Backdoor injection
backward
forward
Two-stage attack
One-stage attack
6
7
Data-poisoning based Backdoor Attack
8
Visible & Invisible
9
The modification of the original sample xε−x0 can be noticeable by human visual perception, but it will not interfere human’s prediction.
10
Visible poisoned samples
BadNets: Evaluating Backdooring Attacks on Deep Neural Networks . IEEE Access 2019.
BadNets
11
The modification of the original sample xε− x0 are less detectable by human visual perception, while maintaining the high attack success rate.
· Alpha blending · Digital steganography
· Adversarial perturbation · Slight transformation
12
α
1 - α
Trigger: The Hello Kitty pattern
Blended
Alpha blending
Digital steganography
Adversarial perturbation
Slight transformation
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv 2017.
13
Trigger: The random pattern
α
1 - α
Blended
Alpha blending
Digital steganography
Adversarial perturbation
Slight transformation
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv 2017.
14
Alpha blending
Digital steganography
Adversarial perturbation
Slight transformation
Digital steganography
insert secret message
15
Steganography: Least significant bit modification
Alpha blending
Digital steganography
Adversarial perturbation
Slight transformation
LSB
Invisible backdoor attacks on deep neural networks via steganography and regularization. TDSC 2021.
16
Alpha blending
Digital steganography
Adversarial perturbation
Slight transformation
Adversarial perturbation
17
UAP
Alpha blending
Digital steganography
Adversarial perturbation
Slight transformation
Universal adversarial perturbations. CVPR 2017.
Trigger: UAP
18
Alpha blending
Digital steganography
Adversarial perturbation
Slight transformation
Advdoor
Advdoor: Adversarial backdoor attack of deep learning system. ISSTA 2021.
Compared with traditional patch backdoor:
19
Slight Spatial transformation
Color Distortion
Slight transformation
Alpha blending
Digital steganography
Adversarial perturbation
Slight transformation
20
WaNet
Alpha blending
Digital steganography
Adversarial perturbation
Slight transformation
Wanet - imperceptible warping based backdoor attack. ICLR 2021.
Warping: a basic image processing technique
21
Poisoned samples:
Wanet - imperceptible warping based backdoor attack. ICLR 2021.
WaNet
Alpha blending
Digital steganography
Adversarial perturbation
Slight transformation
22
Use color distortion as trigger:
Alpha blending
Digital steganography
Adversarial perturbation
Slight transformation
DFST
Deep feature space trojan attack of neural networks by controlled detoxification. AAAI 2021.
23
Non-semantic & Semantic
24
The trigger has no semantic meaning, such as a small checkerboard grid or random noise.
25
Low Frequency Trigger
Rethinking the Backdoor Attacks’ Triggers: A Frequency Perspective. ICCV 2021
Discrete cosine transform: top-left corner is low frequency, right-bottom is high frequency
Observations:
Inspiration:
26
Low Frequency Trigger
Rethinking the Backdoor Attacks’ Triggers: A Frequency Perspective. ICCV 2021
Smooth trigger: a low pass filter in frequency domain
27
The trigger corresponds to some semantic objects with particular attributes contained in the benign sample, such as the red car in one image, or a particular word in one sentence.
28
Composite
Patch based attack
( Non-semantic trigger)
Trojaning face recognition models:
Composite attack
( Semantic trigger)
Composite backdoor attack for deep neural network by mixing existing benign features. CCS 2020
29
Visible, Semantic, Sample-Specific, and Compatible (VSSC) Triggers
Robust Backdoor Attack with Visible, Semantic, Sample-Specific, and Compatible Triggers. arXiv 2023.
Observations:
30
VSSC
Robust Backdoor Attack with Visible, Semantic, Sample-Specific, and Compatible Triggers. arXiv 2023.
Our approach:
31
VSSC
Robust Backdoor Attack with Visible, Semantic, Sample-Specific, and Compatible Triggers. arXiv 2023.
Dogs + Red flower
Dogs + Harness
Target label: Maltese Dog
Target label: Maltese Dog
Foods + Nuts
Foods + Red flower
Target label: Bread
Target label: Bread
Success backdoor attack in physical world, with the real object as the trigger
32
Manually designed & Learnable
33
The trigger is manually designed by the attacker, such as grid square trigger, cartoon pattern, random noise, etc.
34
BadNets | Blended | SIG | FaceHack
BadNets: Evaluating Backdooring Attacks on Deep Neural Networks. IEEE Access 2019.
Grid square trigger:
Cartoon pattern:
backdoor
original
backdoor
original
Ramp signal:
backdoor
original
Makeup filter:
backdoor
original
35
BadNets | Blended | SIG | FaceHack
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv 2017.
Grid square trigger:
Cartoon pattern:
backdoor
original
backdoor
original
Ramp signal:
backdoor
original
Makeup filter:
backdoor
original
36
BadNets | Blended | SIG | FaceHack
A new backdoor attack in cnns by training set corruption without label poisoning. ICIP 2019.
Grid square trigger:
Cartoon pattern:
backdoor
original
backdoor
original
Ramp signal:
backdoor
original
Makeup filter:
backdoor
original
37
BadNets | Blended | SIG | FaceHack
FaceHack: Triggering backdoored facial recognition systems using facial characteristics. IEEE TBBIS 2022.
Grid square trigger:
Cartoon pattern:
backdoor
original
backdoor
original
Ramp signal:
backdoor
original
Makeup filter:
backdoor
original
38
The trigger is generated through optimizing an objective function that is related to the benign sample or a model, to achieve some particular goals.
39
Poison frogs
Poison frogs! targeted clean-label poisoning attacks on neural networks. NeurIPS 2018.
Target instance
Base instance
40
Digital & Physical
41
Most existing backdoor attack works only considered the digital trigger, i.e., the trigger in both training and inference stages only existed in digital space.
42
Sleeper Agent
Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch. NeurIPS 2022.
Overview of Sleeper Agent :
43
The physical backdoor attack that uses some physical objects as triggers in inference stage.
44
Physical Attack
Digital Trigger
Physical Trigger
(real object)
Backdoor attacks against deep learning systems in the physical world. CVPR 2021.
45
PTB
Ptb: Robust physical backdoor attacks against deep neural networks in real world. CVPR 2022.
Backdoor face images generation:
46
Data-poisoning based Backdoor Attack
47
Additive & Non-additive
48
The poisoned image is the additive fusion of the benign image and trigger.
49
Invisible Poison
Poison samples based on different additive trigger:
Invisible poison: A blackbox clean label backdoor attack to deep neural networks. ICCC 2021.
50
The poisoned image is generated by some types of non-additive transformation function, such as the color/style/attribute transformation, or the spatial transformation.
51
IRBA
Poisoned sample generation via local nonlinear transformation:
Imperceptible and robust backdoor attack in 3d point cloud. arXiv 2022.
52
Static & Dynamic
53
The trigger is fixed across the poisoned training samples, including the pattern and location.
54
BadNets | Blended
Original Samples
Static trigger’s characteristics: Fixed across poisoned samples
BadNets: Evaluating Backdooring Attacks on Deep Neural Networks. IEEE Access 2019.
BadNets
Blended
55
BadNets | Blended
Original Samples
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv 2017.
BadNets
Blended
Static trigger’s characteristics: Fixed across poisoned samples
56
There is variation or randomness of the trigger across the poisoned samples, which could be implemented by adding randomness into the fusion function or the trigger transformation.
57
Random backdoor
Comparison between static and dynamic backdoors:
Dynamic backdoor attacks against machine learning models. EuroS&P 2022.
58
Sample-agnostic & Sample-specific
59
The trigger xε −x0 is independent with the benign sample x0.
60
ROBNET
Poisoned samples generation :
Defense-resistant backdoor attacks against deep neural networks in outsourced cloud environment. IEEE JSAC 2021 .
enlarge
61
The trigger xε − x0 is related to the benign sample x0. In terms of the fusion function, one typical choice is utilizing the image steganography technique; Another one technique is transformation.
62
Steganography Technique
Transformation
SSBA
Invisible backdoor attack with sample-specific triggers. ICCV 2021.
Loop 1
Loop 2
63
Steganography Technique
Transformation
LSB
Invisible backdoor attacks on deep neural networks via steganography and regularization. TDSC 2021.
64
Steganography Technique
Transformation
Poison ink
Poison ink: Robust and invisible backdoor attack. TIP 2022.
65
Data-poisoning based Backdoor Attack
66
Single-target class & Multi-target class
67
Single-target class & Multi-target class
All poisoned training samples are labeled as one single target class.
68
Badnets all-to-one
All-to-one attack
The attack changes the label to target label, which means:
airplane
bird
cat
automobile
deer
Badnets: Identifying vulnerabilities in the machine learning model supply chain. IEEE Access 2019
69
Multi-target classes mean that there are multiple target classes.
· all to all
· multi-target with multi-trigger
Single-target class & Multi-target class
70
All-to-all attack
The attack changes the label of digit i to digit i + 1 for backdoored inputs. which means:
airplane
automobile
bird
cat
airplane
automobile
bird
cat
Badnets: Identifying vulnerabilities in the machine learning model supply chain. IEEE Access 2019
all to all
multi-target with multi trigger
BadNets all-to-one
71
Overview of BaN
all to all
multi-target with multi trigger
Dynamic backdoor attacks against machine learning models. IEEE Euro S&P 2022.
trigger generater
target label
backdoor inject
activate
0
Standard BaN
(multi-trigger
one target)
Conditional BaN
(multi-trigger
Multi-target)
difference
BaN: Backdoor Generating Network
72
Marksman's Class-Conditional Trigger
all to all
BaN: Backdoor Generating Network
Marksman backdoor: Backdoor attacks with arbitrary target class. NeurIPS 2022
Marksman Backdoor
73
Label-inconsistent trigger & Label-consistent trigger
74
Label-inconsistent trigger & Label-consistent trigger
The poisoned sample’s label is changed to the target class, such that image and the label are inconsistent
75
Backdoor data
Badnets
Blended
SSBA
‘Airplane’
Badnets: Identifying vulnerabilities in the machine learning model supply chain. IEEE Access 2019
Targeted backdoor attacks on deep learning systems using data poisoning. arXiv 2017
Invisible backdoor attack with sample-specific triggers. ICCV 2021
Badnets | Blended | SSBA
76
Label-inconsistent trigger & Label-consistent trigger
The poisoned sample’s label is not changed, such that image and the label are consistent. It is more stealthy under human inspection.
77
Label-consistent backdoor attacks. arXiv 2019
Overview of data poisoning
benign sample
corruption
add trigger
LC: Label-consistent trigger
78
Data preview
One-corner tirgger
Four-corner tirgger
Poisoned Data
Label-consistent backdoor attacks, Arxiv 2019
Corrupted Data
LC: Label-consistent trigger
79
Data corruption with GAN
Label-consistent backdoor attacks, Arxiv 2019
Embed to feature space:
Generate sample:
LC: Label-consistent trigger
80
Data corruption with adversarial attack
Label-consistent backdoor attacks. arXiv 2019
Generate sample:
LC: Label-consistent trigger
81
Examples of reflection in real life
Reflections that exist in natural images often influence the performance of computer vision models.
Reflection backdoor: A natural backdoor attack on deep neural networks. ECCV 2020.
Refool: Reflection backdoor
82
Reflection backdoor: A natural backdoor attack on deep neural networks. ECCV 2020
Trigger : A variety of reflections
Reflecion
Clean Image
Poisoned samples generation
+
Glass
Backdoor Image
Refool: Reflection backdoor
83
Benign training dataset
…
Random Strategy
84
Data-Efficient Backdoor Attacks. IJCAI 2022.
FUS Strategy
Candidate poisoned dataset
Poisoned samples
Benign samples
Sample
Filter by forgetting events
Poisoned samples
Update
Initialize the poisoned samples by randomly sampling
Filtering:
For each step:
Train a backdoored model and record forgetting events
Filter out some samples with low forgetting numbers
Updating:
Update the filtered samples by randomly sampling
Filtering-and-Updating (FUS) Strategy
85
Hard poisoning samples are more important than easy ones.
Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy. arXiv 2023.
poisoning sample
benign sample
Learnable Poisoning Sample Selection Strategy (LPS) Strategy
86
Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy. arXiv 2023.
Poisoned samples
Benign samples
Outer optimization
Inner optimization
LPS Strategy: min-max optimization
For each step:
Learnable Poisoning Sample Selection Strategy (LPS) Strategy
87
One-stage training & Two-stage training
88
One-stage training & Two-stage training
89
The overview of Imperceptible Backdoor Attack
Imperceptible backdoor attack: From input space to feature representation, IJCAI 2022
Optimize together
Imperceptible Backdoor Attack
90
Training process of Lira
Lira: Learnable, imperceptible and robust backdoor attacks
Lira: Learnable, imperceptible and robust backdoor attacks. ICCV 2021.
f: classification model
T: trigger generater
91
Full access of traning data & Partial access of training data
92
The attacker has full access to training data, such that any training data could be manipulated.
The attacker has no access to all the data. For example, in the scenario of distributed learning or federated learning (FL), each client can only access partial training data.
Full access of traning data & Partial access of training data
93
Backdoor with FL
Backdoors can be inserted into FL models, but they do not remain in the model after the attacker stops uploading poisoned updates.
Neurotoxin: Durable backdoors in federated learning. ICML 2022
Neurotoxin
94
Clients
Server
Upload parameter
Parameter distribution
Adversarial
Clients
Upload parameter
Parameter distribution
Neurotoxin: Durable backdoors in federated learning. ICML 2022
Gradient
Neuron idx
The important neuron for clean feature
Neuron idx
Filter
Upload gradient
Neurotoxin
95
Full control of traning & Partial control of training
96
The attacker has the chance to fully control the training process.
Sometimes the training process is separated into several stages by different trainers. Consequently, the attacker can only control a partial training process.
Full control of traning & Partial control of training
97
Each layer adds a loss
pre-training (under control)
fine-tuning (out of control)
Backdoor attacks on pre-trained models by layerwise weight poisoning. EMNLP 2021
Shallow backdoor neurons are difficult to eliminate
Shallow backdoor neurons are easy to eliminate
LWP: Layer Weight-Poison
98
PPT: Poisoned Prompt Tuning
The overview of PPT
PPT: Backdoor Attacks on Pre-trained Models via Poisoned Prompt Tuning, IJCAI 2022
poisoned text
prompt
original text
99
Different components of the training procedure
100
Backdoor Attack Mode
training algorithm
training loss
training order
Classification loss under cross-trigger mode:
Input-Aware dynamic backdoor attack. NeurIPS 2020
Input-Aware
101
Lira: Learnable, imperceptible and robust backdoor attacks
Lira: Learnable, imperceptible and robust backdoor attacks. ICCV 2021
The objective function:
training algorithm
training loss
training order
102
Overview of BOB
training algorithm
training loss
training order
Manipulating SGD with Data Ordering Attacks. NeurIPS 2021
The surrogate model trained on poisoned/adversarial samples provides a gradient guidance
BOB: Batch-Order Backdoor
Manipulating the batch order to approximate the guided gradient