2 of 102

Pre-training stage

Training stage

backward

forward

Post-training stage

Inference stage

Poisioned data generation

Backdoor injection

Backdoor activation

Backdoored model

backward

forward

Standard training procedure

Backdoor attack procedure

3 of 102

Poisioned data generation

Backdoor injection

Backdoor activation

Backdoored model

Data-poisoning based Backdoor Attack

Training-controllable based Backdoor Attack

backward

forward

4 of 102

Data-poisoning based Backdoor Attack

5 of 102

Poisioned data generation

Backdoor injection

backward

forward

Training-controllable based Backdoor Attack

Poisioned data generation

Backdoor injection

backward

forward

Two-stage attack

One-stage attack

7 of 102

Data-poisoning based Backdoor Attack

8 of 102

Visible & Invisible

9 of 102

The modification of the original sample x_ε−x₀ can be noticeable by human visual perception, but it will not interfere human’s prediction.

10 of 102

On MNIST dataset

On U.S. traffic signs dataset

Visible poisoned samples

BadNets: Evaluating Backdooring Attacks on Deep Neural Networks . IEEE Access 2019.

BadNets

11 of 102

The modification of the original sample x_ε− x₀ are less detectable by human visual perception, while maintaining the high attack success rate.

· Alpha blending · Digital steganography

· Adversarial perturbation · Slight transformation

12 of 102

1 - α

Trigger: The Hello Kitty pattern

Blended

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv 2017.

13 of 102

Trigger: The random pattern

1 - α

Blended

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv 2017.

14 of 102

Digital steganography can conceal the existence of the secret message without changing the apparent content of the file

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

Digital steganography

insert secret message

15 of 102

Steganography: Least significant bit modification

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

LSB

Invisible backdoor attacks on deep neural networks via steganography and regularization. TDSC 2021.

16 of 102

Adversarial perturbation can be a small, imperceptible change made to an image that causes the model to misclassify it.

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

Adversarial perturbation

17 of 102

UAP

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

Universal adversarial perturbations. CVPR 2017.

UAP (Universal Adversarial Perturbation) is a type of adversarial perturbation.
UAPs are universal, meaning that they can be applied to any input image and still cause the model to misclassify it.

Trigger: UAP

18 of 102

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

Advdoor

Advdoor: Adversarial backdoor attack of deep learning system. ISSTA 2021.

Adversarial Backdoor can effectively attack the model without modifying the original training process.

Compared with traditional patch backdoor:

19 of 102

Slight transformation for an image refers to a small, intentional modification of the image’s characteristics, such as slight spatial transformation or color distortion.
Slight transformation can be used to improve the performance and robustness of machine learning models.

Slight Spatial transformation

Color Distortion

Slight transformation

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

20 of 102

Image warping is the process of digitally manipulating an image such that any shapes portrayed in the image could be significantly distorted.

WaNet

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

Wanet - imperceptible warping based backdoor attack. ICLR 2021.

Warping: a basic image processing technique

21 of 102

Poisoned samples:

Different hyperparameters have different effects on warping results.
For each warped image, we show the image (top), the magnified (×2) residual map (bottom).

Wanet - imperceptible warping based backdoor attack. ICLR 2021.

WaNet

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

22 of 102

Use color distortion as trigger:

A CycleGAN is trained to serve as the trigger generator, which aims to derive a generative model that can transfer the features encoded in the style input set to the training inputs.

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

DFST

Deep feature space trojan attack of neural networks by controlled detoxification. AAAI 2021.

23 of 102

Non-semantic & Semantic

24 of 102

The trigger has no semantic meaning, such as a small checkerboard grid or random noise.

25 of 102

Low Frequency Trigger

Rethinking the Backdoor Attacks’ Triggers: A Frequency Perspective. ICCV 2021

Discrete cosine transform: top-left corner is low frequency, right-bottom is high frequency

Observations:

For clean image, there is no high frequency signal

For poisoned images, there are always high frequency artifacts

Inspiration:

High frequency artifacts could be utilized as a common discriminative feature for poisoned image detection
Then, how to evade that defense?

26 of 102

Low Frequency Trigger

Rethinking the Backdoor Attacks’ Triggers: A Frequency Perspective. ICCV 2021

Smooth trigger: a low pass filter in frequency domain

27 of 102

The trigger corresponds to some semantic objects with particular attributes contained in the benign sample, such as the red car in one image, or a particular word in one sentence.

28 of 102

Composite

Patch based attack

( Non-semantic trigger)

Trojaning face recognition models:

Composite attack

( Semantic trigger)

Composite backdoor attack for deep neural network by mixing existing benign features. CCS 2020

29 of 102

Visible, Semantic, Sample-Specific, and Compatible (VSSC) Triggers

Robust Backdoor Attack with Visible, Semantic, Sample-Specific, and Compatible Triggers. arXiv 2023.

Observations:

Most invisible triggers are vulnerable to visual distortion, from digital image processing or physical world
Visible triggers are more robust to visual distortion, but not stealthy under human perception

30 of 102

VSSC

Robust Backdoor Attack with Visible, Semantic, Sample-Specific, and Compatible Triggers. arXiv 2023.

Our goal: stealthy and robust trigger in both digital and physical scenarios
Our solution: We define a novel trigger with visible, sample-specific, semantic, and compatible characteristics

Our approach:

Step 1: text trigger selection via large language model

Step 2: insert the selected text trigger into the benign image via text-guided image editing technique (e.g., stable diffusion)

31 of 102

VSSC

Robust Backdoor Attack with Visible, Semantic, Sample-Specific, and Compatible Triggers. arXiv 2023.

Dogs + Red flower

Dogs + Harness

Target label: Maltese Dog

Foods + Nuts

Foods + Red flower

Target label: Bread

Success backdoor attack in physical world, with the real object as the trigger

32 of 102

Manually designed & Learnable

33 of 102

The trigger is manually designed by the attacker, such as grid square trigger, cartoon pattern, random noise, etc.

34 of 102

BadNets | Blended | SIG | FaceHack

BadNets: Evaluating Backdooring Attacks on Deep Neural Networks. IEEE Access 2019.

Grid square trigger:

Cartoon pattern:

backdoor

original

backdoor

original

Ramp signal:

backdoor

original

Makeup filter:

backdoor

original

35 of 102

BadNets | Blended | SIG | FaceHack

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv 2017.

Grid square trigger:

Cartoon pattern:

backdoor

original

backdoor

original

Ramp signal:

backdoor

original

Makeup filter:

backdoor

original

36 of 102

BadNets | Blended | SIG | FaceHack

A new backdoor attack in cnns by training set corruption without label poisoning. ICIP 2019.

Grid square trigger:

Cartoon pattern:

backdoor

original

backdoor

original

Ramp signal:

backdoor

original

Makeup filter:

backdoor

original

37 of 102

BadNets | Blended | SIG | FaceHack

FaceHack: Triggering backdoored facial recognition systems using facial characteristics. IEEE TBBIS 2022.

Grid square trigger:

Cartoon pattern:

backdoor

original

backdoor

original

Ramp signal:

backdoor

original

Makeup filter:

backdoor

original

38 of 102

The trigger is generated through optimizing an objective function that is related to the benign sample or a model, to achieve some particular goals.

39 of 102

Poison frogs

Minimize the L₂ distance to the target instance in feature space

Poison frogs! targeted clean-label poisoning attacks on neural networks. NeurIPS 2018.

Minimize the Fresenius distance to the base instance in input space

Target instance

Base instance

40 of 102

Digital & Physical

41 of 102

Most existing backdoor attack works only considered the digital trigger, i.e., the trigger in both training and inference stages only existed in digital space.

42 of 102

Sleeper Agent

Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch. NeurIPS 2022.

A small proportion of slightly perturbed data with trigger is added to the training set which “backdoors” the model, so that it misclassifies patched images at inference.

Overview of Sleeper Agent :

43 of 102

The physical backdoor attack that uses some physical objects as triggers in inference stage.

44 of 102

Physical Attack

Digital Trigger

Physical Trigger

(real object)

Backdoor attacks against deep learning systems in the physical world. CVPR 2021.

45 of 102

PTB

Ptb: Robust physical backdoor attacks against deep neural networks in real world. CVPR 2022.

Backdoor face images generation:

46 of 102

Data-poisoning based Backdoor Attack

47 of 102

Additive & Non-additive

48 of 102

The poisoned image is the additive fusion of the benign image and trigger.

49 of 102

Invisible Poison

Poison samples based on different additive trigger:

Invisible poison: A blackbox clean label backdoor attack to deep neural networks. ICCC 2021.

50 of 102

The poisoned image is generated by some types of non-additive transformation function, such as the color/style/attribute transformation, or the spatial transformation.

51 of 102

IRBA

Poisoned sample generation via local nonlinear transformation:

Imperceptible and robust backdoor attack in 3d point cloud. arXiv 2022.

52 of 102

Static & Dynamic

53 of 102

The trigger is fixed across the poisoned training samples, including the pattern and location.

54 of 102

BadNets | Blended

Original Samples

Static trigger’s characteristics: Fixed across poisoned samples

BadNets: Evaluating Backdooring Attacks on Deep Neural Networks. IEEE Access 2019.

BadNets

Blended

55 of 102

BadNets | Blended

Original Samples

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv 2017.

BadNets

Blended

Static trigger’s characteristics: Fixed across poisoned samples

56 of 102

There is variation or randomness of the trigger across the poisoned samples, which could be implemented by adding randomness into the fusion function or the trigger transformation.

57 of 102

Random backdoor

Comparison between static and dynamic backdoors:

Static backdoors have a fixed trigger.
Dynamic backdoors adopted different (but similar) triggers for the same target label.

Dynamic backdoor attacks against machine learning models. EuroS&P 2022.

58 of 102

Sample-agnostic & Sample-specific

59 of 102

The trigger x_ε −x₀ is independent with the benign sample x₀.

60 of 102

ROBNET

Poisoned samples generation :

Defense-resistant backdoor attacks against deep neural networks in outsourced cloud environment. IEEE JSAC 2021 .

enlarge

The trigger is generated to excite the selected neuron, e.g., the activation of the yellow neuron increases from 2 to 12 . This generated trigger is patched to different training samples.

61 of 102

The trigger x_ε − x₀ is related to the benign sample x₀. In terms of the fusion function, one typical choice is utilizing the image steganography technique; Another one technique is transformation.

62 of 102

Steganography Technique

Transformation

SSBA

SSBA adopts a double-loop auto-encoder based digital steganography technique, which merged the trigger information into the benign image

Invisible backdoor attack with sample-specific triggers. ICCV 2021.

Loop 1

Loop 2

The residual x_ε − x₀ is unique for each x₀, as the encoder is a nonlinear function.

63 of 102

Steganography Technique

Transformation

The least significant bits vary in different benign images, so x_ε − x₀ is specific to each x₀.

LSB

Invisible backdoor attacks on deep neural networks via steganography and regularization. TDSC 2021.

64 of 102

Steganography Technique

Transformation

Poison ink

Poison ink extracts a black and white edge image from one benign image, then colorized the edge image with a particular color as the trigger.
Since the edge image is specific for each benign image, the trigger is sample-specific.

Poison ink: Robust and invisible backdoor attack. TIP 2022.

65 of 102

Data-poisoning based Backdoor Attack

66 of 102

Single-target class & Multi-target class

67 of 102

Single-target class & Multi-target class

All poisoned training samples are labeled as one single target class.

68 of 102

Badnets all-to-one

All-to-one attack

The attack changes the label to target label, which means:

airplane

bird

cat

automobile

deer

Badnets: Identifying vulnerabilities in the machine learning model supply chain. IEEE Access 2019

69 of 102

Multi-target classes mean that there are multiple target classes.

· all to all

· multi-target with multi-trigger

Single-target class & Multi-target class

70 of 102

All-to-all attack

The attack changes the label of digit i to digit i + 1 for backdoored inputs. which means:

airplane

automobile

bird

cat

airplane

automobile

bird

cat

Badnets: Identifying vulnerabilities in the machine learning model supply chain. IEEE Access 2019

all to all

multi-target with multi trigger

BadNets all-to-one

71 of 102

Overview of BaN

all to all

multi-target with multi trigger

Dynamic backdoor attacks against machine learning models. IEEE Euro S&P 2022.

trigger generater

target label

backdoor inject

activate

Standard BaN

(multi-trigger

one target)

Conditional BaN

(multi-trigger

Multi-target)

difference

BaN: Backdoor Generating Network

72 of 102

Marksman's Class-Conditional Trigger

all to all

BaN: Backdoor Generating Network

Marksman backdoor: Backdoor attacks with arbitrary target class. NeurIPS 2022

Marksman Backdoor

73 of 102

Label-inconsistent trigger & Label-consistent trigger

74 of 102

Label-inconsistent trigger & Label-consistent trigger

The poisoned sample’s label is changed to the target class, such that image and the label are inconsistent

75 of 102

Backdoor data

Badnets

Blended

SSBA

‘Airplane’

Badnets: Identifying vulnerabilities in the machine learning model supply chain. IEEE Access 2019

Targeted backdoor attacks on deep learning systems using data poisoning. arXiv 2017

Invisible backdoor attack with sample-specific triggers. ICCV 2021

Badnets | Blended | SSBA

76 of 102

Label-inconsistent trigger & Label-consistent trigger

The poisoned sample’s label is not changed, such that image and the label are consistent. It is more stealthy under human inspection.

77 of 102

Label-consistent backdoor attacks. arXiv 2019

Destroying the original features of the benign image from the target class, while maintaining its visual appearance
Adding the trigger onto the corrupted image
Thus, although the label seems to be consistent with visual content, the model learns the mapping from the trigger to the target label

Overview of data poisoning

benign sample

corruption

add trigger

LC: Label-consistent trigger

78 of 102

Data preview

One-corner tirgger

Four-corner tirgger

Poisoned Data

Label-consistent backdoor attacks, Arxiv 2019

Corrupted Data

LC: Label-consistent trigger

79 of 102

Data corruption with GAN

Label-consistent backdoor attacks, Arxiv 2019

Embed to feature space:

Generate sample:

LC: Label-consistent trigger

80 of 102

Data corruption with adversarial attack

Label-consistent backdoor attacks. arXiv 2019

Generate sample:

LC: Label-consistent trigger

81 of 102

Examples of reflection in real life

Reflections that exist in natural images often influence the performance of computer vision models.

Reflection backdoor: A natural backdoor attack on deep neural networks. ECCV 2020.

Refool: Reflection backdoor

82 of 102

Reflection backdoor: A natural backdoor attack on deep neural networks. ECCV 2020

Trigger : A variety of reflections

Reflecion

Clean Image

Poisoned samples generation

Glass

Backdoor Image

Refool: Reflection backdoor

83 of 102

Most backdoor attacks randomly select benign samples to be poisoned.
Is there more suitable samples for poisoning instead of random selection?

Benign training dataset

…

Random Strategy

84 of 102

Data-Efficient Backdoor Attacks. IJCAI 2022.

FUS Strategy

Candidate poisoned dataset

Poisoned samples

Benign samples

Sample

Filter by forgetting events

Poisoned samples

Update

Initialize the poisoned samples by randomly sampling

Filtering:

For each step:

Train a backdoored model and record forgetting events

Filter out some samples with low forgetting numbers

Updating:

Update the filtered samples by randomly sampling

Filtering-and-Updating (FUS) Strategy

85 of 102

Hard poisoning samples are more important than easy ones.

Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy. arXiv 2023.

poisoning sample

benign sample

Learnable Poisoning Sample Selection Strategy (LPS) Strategy

86 of 102

Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy. arXiv 2023.

Poisoned samples

Benign samples

Outer optimization

Inner optimization

LPS Strategy: min-max optimization

For each step:

Learnable Poisoning Sample Selection Strategy (LPS) Strategy

87 of 102

One-stage training & Two-stage training

88 of 102

One-stage training & Two-stage training

89 of 102

The overview of Imperceptible Backdoor Attack

Imperceptible backdoor attack: From input space to feature representation, IJCAI 2022

Optimize together

Imperceptible Backdoor Attack

90 of 102

Training process of Lira

Lira: Learnable, imperceptible and robust backdoor attacks

Lira: Learnable, imperceptible and robust backdoor attacks. ICCV 2021.

f: classification model

T: trigger generater

The trigger T and the classifier f are optimized iteratively.

91 of 102

Full access of traning data & Partial access of training data

92 of 102

The attacker has full access to training data, such that any training data could be manipulated.

The attacker has no access to all the data. For example, in the scenario of distributed learning or federated learning (FL), each client can only access partial training data.

Full access of traning data & Partial access of training data

93 of 102

Backdoor with FL

Backdoors can be inserted into FL models, but they do not remain in the model after the attacker stops uploading poisoned updates.

Neurotoxin: Durable backdoors in federated learning. ICML 2022

Neurotoxin

94 of 102

Clients

Server

Upload parameter

Parameter distribution

Adversarial

Clients

Upload parameter

Parameter distribution

Neurotoxin: Durable backdoors in federated learning. ICML 2022

Gradient

Neuron idx

The important neuron for clean feature

Neuron idx

Filter

Upload gradient

Neurotoxin

95 of 102

Full control of traning & Partial control of training

96 of 102

The attacker has the chance to fully control the training process.

Sometimes the training process is separated into several stages by different trainers. Consequently, the attacker can only control a partial training process.

Full control of traning & Partial control of training

97 of 102

Each layer adds a loss

pre-training (under control)

fine-tuning (out of control)

Backdoor attacks on pre-trained models by layerwise weight poisoning. EMNLP 2021

Shallow backdoor neurons are difficult to eliminate

Shallow backdoor neurons are easy to eliminate

LWP: Layer Weight-Poison

98 of 102

PPT: Poisoned Prompt Tuning

The overview of PPT

PPT: Backdoor Attacks on Pre-trained Models via Poisoned Prompt Tuning, IJCAI 2022

poisoned text

prompt

original text

99 of 102

Different components of the training procedure

100 of 102

100

Backdoor Attack Mode

Clean mode: The network is encouraged to correctly recognize clean images
Attack mode: The attack should be activated on poisoned data.
Cross-trigger mode: Poison data with other triggers/noise should not be activated

training algorithm

training loss

training order

Classification loss under cross-trigger mode:

Input-Aware dynamic backdoor attack. NeurIPS 2020

Input-Aware

101 of 102

101

Lira: Learnable, imperceptible and robust backdoor attacks

Lira: Learnable, imperceptible and robust backdoor attacks. ICCV 2021

The objective function:

training algorithm

training loss

training order

102 of 102

102

Overview of BOB

training algorithm

training loss

training order

Manipulating SGD with Data Ordering Attacks. NeurIPS 2021

The surrogate model trained on poisoned/adversarial samples provides a gradient guidance

BOB: Batch-Order Backdoor

Manipulating the batch order to approximate the guided gradient