1 of 102

1

Part II

Backdoor Attacks

2 of 102

2

Pre-training stage

Training stage

backward

forward

Post-training stage

Inference stage

Poisioned data generation

Backdoor injection

Backdoor activation

Backdoored model

backward

forward

Standard training procedure

Backdoor attack procedure

3 of 102

3

Poisioned data generation

Backdoor injection

Backdoor activation

Backdoored model

Data-poisoning based Backdoor Attack

Training-controllable based Backdoor Attack

backward

forward

4 of 102

4

 

 

 

Data-poisoning based Backdoor Attack

 

5 of 102

5

Poisioned data generation

Backdoor injection

backward

forward

Training-controllable based Backdoor Attack

Poisioned data generation

Backdoor injection

backward

forward

Two-stage attack

One-stage attack

 

 

 

6 of 102

6

 

 

 

7 of 102

7

 

 

 

Data-poisoning based Backdoor Attack

 

8 of 102

8

 

Visible & Invisible

9 of 102

9

The modification of the original sample xε−x0 can be noticeable by human visual perception, but it will not interfere human’s prediction.

 

10 of 102

10

  • On MNIST dataset
  • On U.S. traffic signs dataset

Visible poisoned samples

BadNets: Evaluating Backdooring Attacks on Deep Neural Networks . IEEE Access 2019.

BadNets

11 of 102

11

The modification of the original sample xε− x0 are less detectable by human visual perception, while maintaining the high attack success rate.

· Alpha blending · Digital steganography

· Adversarial perturbation · Slight transformation

 

12 of 102

12

α

1 - α

Trigger: The Hello Kitty pattern

Blended

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv 2017.

13 of 102

13

Trigger: The random pattern

α

1 - α

Blended

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv 2017.

14 of 102

14

  • Digital steganography can conceal the existence of the secret message without changing the apparent content of the file

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

Digital steganography

insert secret message

15 of 102

15

Steganography: Least significant bit modification

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

LSB

Invisible backdoor attacks on deep neural networks via steganography and regularization. TDSC 2021.

16 of 102

16

  • Adversarial perturbation can be a small, imperceptible change made to an image that causes the model to misclassify it.

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

Adversarial perturbation

17 of 102

17

UAP

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

Universal adversarial perturbations. CVPR 2017.

  • UAP (Universal Adversarial Perturbation) is a type of adversarial perturbation.
  • UAPs are universal, meaning that they can be applied to any input image and still cause the model to misclassify it.

Trigger: UAP

18 of 102

18

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

Advdoor

Advdoor: Adversarial backdoor attack of deep learning system. ISSTA 2021.

  • Adversarial Backdoor can effectively attack the model without modifying the original training process.

Compared with traditional patch backdoor:

19 of 102

19

  • Slight transformation for an image refers to a small, intentional modification of the image’s characteristics, such as slight spatial transformation or color distortion.
  • Slight transformation can be used to improve the performance and robustness of machine learning models.

Slight Spatial transformation

Color Distortion

Slight transformation

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

20 of 102

20

  • Image warping is the process of digitally manipulating an image such that any shapes portrayed in the image could be significantly distorted.

WaNet

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

Wanet - imperceptible warping based backdoor attack. ICLR 2021.

Warping: a basic image processing technique

21 of 102

21

Poisoned samples:

  • Different hyperparameters have different effects on warping results.
  • For each warped image, we show the image (top), the magnified (×2) residual map (bottom).

Wanet - imperceptible warping based backdoor attack. ICLR 2021.

WaNet

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

22 of 102

22

Use color distortion as trigger:

  • A CycleGAN is trained to serve as the trigger generator, which aims to derive a generative model that can transfer the features encoded in the style input set to the training inputs.

Alpha blending

Digital steganography

Adversarial perturbation

Slight transformation

DFST

Deep feature space trojan attack of neural networks by controlled detoxification. AAAI 2021.

23 of 102

23

Non-semantic & Semantic

 

24 of 102

24

The trigger has no semantic meaning, such as a small checkerboard grid or random noise.

 

25 of 102

25

Low Frequency Trigger

Rethinking the Backdoor Attacks’ Triggers: A Frequency Perspective. ICCV 2021

Discrete cosine transform: top-left corner is low frequency, right-bottom is high frequency

Observations:

  • For clean image, there is no high frequency signal
  • For poisoned images, there are always high frequency artifacts

Inspiration:

  • High frequency artifacts could be utilized as a common discriminative feature for poisoned image detection
  • Then, how to evade that defense?

26 of 102

26

Low Frequency Trigger

Rethinking the Backdoor Attacks’ Triggers: A Frequency Perspective. ICCV 2021

Smooth trigger: a low pass filter in frequency domain

27 of 102

27

The trigger corresponds to some semantic objects with particular attributes contained in the benign sample, such as the red car in one image, or a particular word in one sentence.

 

28 of 102

28

Composite

Patch based attack

( Non-semantic trigger)

Trojaning face recognition models:

Composite attack

( Semantic trigger)

Composite backdoor attack for deep neural network by mixing existing benign features. CCS 2020

29 of 102

29

Visible, Semantic, Sample-Specific, and Compatible (VSSC) Triggers

Robust Backdoor Attack with Visible, Semantic, Sample-Specific, and Compatible Triggers. arXiv 2023.

Observations:

  • Most invisible triggers are vulnerable to visual distortion, from digital image processing or physical world
  • Visible triggers are more robust to visual distortion, but not stealthy under human perception

30 of 102

30

VSSC

Robust Backdoor Attack with Visible, Semantic, Sample-Specific, and Compatible Triggers. arXiv 2023.

    • Our goal: stealthy and robust trigger in both digital and physical scenarios
    • Our solution: We define a novel trigger with visible, sample-specific, semantic, and compatible characteristics

Our approach:

  • Step 1: text trigger selection via large language model
  • Step 2: insert the selected text trigger into the benign image via text-guided image editing technique (e.g., stable diffusion)

31 of 102

31

VSSC

Robust Backdoor Attack with Visible, Semantic, Sample-Specific, and Compatible Triggers. arXiv 2023.

Dogs + Red flower

Dogs + Harness

Target label: Maltese Dog

Target label: Maltese Dog

Foods + Nuts

Foods + Red flower

Target label: Bread

Target label: Bread

Success backdoor attack in physical world, with the real object as the trigger

32 of 102

32

Manually designed & Learnable

 

33 of 102

33

The trigger is manually designed by the attacker, such as grid square trigger, cartoon pattern, random noise, etc.

 

34 of 102

34

BadNets | Blended | SIG | FaceHack

BadNets: Evaluating Backdooring Attacks on Deep Neural Networks. IEEE Access 2019.

Grid square trigger:

Cartoon pattern:

backdoor

original

backdoor

original

Ramp signal:

backdoor

original

Makeup filter:

backdoor

original

35 of 102

35

BadNets | Blended | SIG | FaceHack

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv 2017.

Grid square trigger:

Cartoon pattern:

backdoor

original

backdoor

original

Ramp signal:

backdoor

original

Makeup filter:

backdoor

original

36 of 102

36

BadNets | Blended | SIG | FaceHack

A new backdoor attack in cnns by training set corruption without label poisoning. ICIP 2019.

Grid square trigger:

Cartoon pattern:

backdoor

original

backdoor

original

Ramp signal:

backdoor

original

Makeup filter:

backdoor

original

37 of 102

37

BadNets | Blended | SIG | FaceHack

FaceHack: Triggering backdoored facial recognition systems using facial characteristics. IEEE TBBIS 2022.

Grid square trigger:

Cartoon pattern:

backdoor

original

backdoor

original

Ramp signal:

backdoor

original

Makeup filter:

backdoor

original

38 of 102

38

The trigger is generated through optimizing an objective function that is related to the benign sample or a model, to achieve some particular goals.

 

39 of 102

39

Poison frogs

  • Minimize the L2 distance to the target instance in feature space

Poison frogs! targeted clean-label poisoning attacks on neural networks. NeurIPS 2018.

  • Minimize the Fresenius distance to the base instance in input space

Target instance

Base instance

40 of 102

40

Digital & Physical

 

41 of 102

41

Most existing backdoor attack works only considered the digital trigger, i.e., the trigger in both training and inference stages only existed in digital space.

 

42 of 102

42

Sleeper Agent

Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch. NeurIPS 2022.

  • A small proportion of slightly perturbed data with trigger is added to the training set which “backdoors” the model, so that it misclassifies patched images at inference.

Overview of Sleeper Agent :

43 of 102

43

The physical backdoor attack that uses some physical objects as triggers in inference stage.

 

44 of 102

44

Physical Attack

Digital Trigger

Physical Trigger

(real object)

Backdoor attacks against deep learning systems in the physical world. CVPR 2021.

45 of 102

45

PTB

Ptb: Robust physical backdoor attacks against deep neural networks in real world. CVPR 2022.

Backdoor face images generation:

46 of 102

46

 

 

 

Data-poisoning based Backdoor Attack

 

47 of 102

47

 

Additive & Non-additive

48 of 102

48

The poisoned image is the additive fusion of the benign image and trigger.

 

49 of 102

49

Invisible Poison

Poison samples based on different additive trigger:

Invisible poison: A blackbox clean label backdoor attack to deep neural networks. ICCC 2021.

50 of 102

50

The poisoned image is generated by some types of non-additive transformation function, such as the color/style/attribute transformation, or the spatial transformation.

 

51 of 102

51

IRBA

Poisoned sample generation via local nonlinear transformation:

Imperceptible and robust backdoor attack in 3d point cloud. arXiv 2022.

52 of 102

52

Static & Dynamic

 

53 of 102

53

The trigger is fixed across the poisoned training samples, including the pattern and location.

 

54 of 102

54

BadNets | Blended

Original Samples

Static trigger’s characteristics: Fixed across poisoned samples

BadNets: Evaluating Backdooring Attacks on Deep Neural Networks. IEEE Access 2019.

BadNets

Blended

55 of 102

55

BadNets | Blended

Original Samples

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv 2017.

BadNets

Blended

Static trigger’s characteristics: Fixed across poisoned samples

56 of 102

56

There is variation or randomness of the trigger across the poisoned samples, which could be implemented by adding randomness into the fusion function or the trigger transformation.

 

57 of 102

57

Random backdoor

Comparison between static and dynamic backdoors:

  • Static backdoors have a fixed trigger.
  • Dynamic backdoors adopted different (but similar) triggers for the same target label.

Dynamic backdoor attacks against machine learning models. EuroS&P 2022.

58 of 102

58

Sample-agnostic & Sample-specific

 

59 of 102

59

The trigger xε −x0 is independent with the benign sample x0.

 

60 of 102

60

ROBNET

Poisoned samples generation :

Defense-resistant backdoor attacks against deep neural networks in outsourced cloud environment. IEEE JSAC 2021 .

enlarge

  • The trigger is generated to excite the selected neuron, e.g., the activation of the yellow neuron increases from 2 to 12 . This generated trigger is patched to different training samples.

61 of 102

61

The trigger xε − x0 is related to the benign sample x0. In terms of the fusion function, one typical choice is utilizing the image steganography technique; Another one technique is transformation.

 

62 of 102

62

Steganography Technique

Transformation

SSBA

  • SSBA adopts a double-loop auto-encoder based digital steganography technique, which merged the trigger information into the benign image

Invisible backdoor attack with sample-specific triggers. ICCV 2021.

Loop 1

Loop 2

  • The residual xε − x0 is unique for each x0, as the encoder is a nonlinear function.

63 of 102

63

Steganography Technique

Transformation

  • The least significant bits vary in different benign images, so xε − x0 is specific to each x0.

LSB

Invisible backdoor attacks on deep neural networks via steganography and regularization. TDSC 2021.

64 of 102

64

Steganography Technique

Transformation

Poison ink

  • Poison ink extracts a black and white edge image from one benign image, then colorized the edge image with a particular color as the trigger.
  • Since the edge image is specific for each benign image, the trigger is sample-specific.

Poison ink: Robust and invisible backdoor attack. TIP 2022.

65 of 102

65

 

 

 

Data-poisoning based Backdoor Attack

 

66 of 102

66

Single-target class & Multi-target class

 

67 of 102

67

Single-target class & Multi-target class

All poisoned training samples are labeled as one single target class.

 

68 of 102

68

Badnets all-to-one

All-to-one attack

The attack changes the label to target label, which means:

airplane

bird

cat

automobile

deer

 

Badnets: Identifying vulnerabilities in the machine learning model supply chain. IEEE Access 2019

69 of 102

69

Multi-target classes mean that there are multiple target classes.

· all to all

· multi-target with multi-trigger

Single-target class & Multi-target class

 

70 of 102

70

All-to-all attack

The attack changes the label of digit i to digit i + 1 for backdoored inputs. which means:

airplane

automobile

bird

cat

airplane

automobile

bird

cat

Badnets: Identifying vulnerabilities in the machine learning model supply chain. IEEE Access 2019

all to all

multi-target with multi trigger

BadNets all-to-one

71 of 102

71

Overview of BaN

all to all

multi-target with multi trigger

Dynamic backdoor attacks against machine learning models. IEEE Euro S&P 2022.

trigger generater

target label

backdoor inject

activate

0

Standard BaN

(multi-trigger

one target)

Conditional BaN

(multi-trigger

Multi-target)

difference

BaN: Backdoor Generating Network

72 of 102

72

Marksman's Class-Conditional Trigger

all to all

BaN: Backdoor Generating Network

Marksman backdoor: Backdoor attacks with arbitrary target class. NeurIPS 2022

Marksman Backdoor

73 of 102

73

Label-inconsistent trigger & Label-consistent trigger

 

74 of 102

74

Label-inconsistent trigger & Label-consistent trigger

The poisoned sample’s label is changed to the target class, such that image and the label are inconsistent

 

75 of 102

75

Backdoor data

Badnets

Blended

SSBA

‘Airplane’

Badnets: Identifying vulnerabilities in the machine learning model supply chain. IEEE Access 2019

Targeted backdoor attacks on deep learning systems using data poisoning. arXiv 2017

Invisible backdoor attack with sample-specific triggers. ICCV 2021

Badnets | Blended | SSBA

76 of 102

76

Label-inconsistent trigger & Label-consistent trigger

The poisoned sample’s label is not changed, such that image and the label are consistent. It is more stealthy under human inspection.

 

77 of 102

77

Label-consistent backdoor attacks. arXiv 2019

  • Destroying the original features of the benign image from the target class, while maintaining its visual appearance
  • Adding the trigger onto the corrupted image
  • Thus, although the label seems to be consistent with visual content, the model learns the mapping from the trigger to the target label

Overview of data poisoning

benign sample

corruption

add trigger

LC: Label-consistent trigger

78 of 102

78

Data preview

One-corner tirgger

Four-corner tirgger

Poisoned Data

Label-consistent backdoor attacks, Arxiv 2019

Corrupted Data

LC: Label-consistent trigger

79 of 102

79

Data corruption with GAN

Label-consistent backdoor attacks, Arxiv 2019

Embed to feature space:

Generate sample:

 

 

LC: Label-consistent trigger

80 of 102

80

Data corruption with adversarial attack

Label-consistent backdoor attacks. arXiv 2019

Generate sample:

 

LC: Label-consistent trigger

81 of 102

81

Examples of reflection in real life

Reflections that exist in natural images often influence the performance of computer vision models.

Reflection backdoor: A natural backdoor attack on deep neural networks. ECCV 2020.

Refool: Reflection backdoor

82 of 102

82

Reflection backdoor: A natural backdoor attack on deep neural networks. ECCV 2020

Trigger : A variety of reflections

Reflecion

Clean Image

Poisoned samples generation

+

Glass

Backdoor Image

Refool: Reflection backdoor

83 of 102

83

  • Most backdoor attacks randomly select benign samples to be poisoned.
  • Is there more suitable samples for poisoning instead of random selection?

Benign training dataset

Random Strategy

84 of 102

84

Data-Efficient Backdoor Attacks. IJCAI 2022.

FUS Strategy

Candidate poisoned dataset

Poisoned samples

Benign samples

Sample

Filter by forgetting events

Poisoned samples

Update

Initialize the poisoned samples by randomly sampling

Filtering:

For each step:

Train a backdoored model and record forgetting events

Filter out some samples with low forgetting numbers

Updating:

Update the filtered samples by randomly sampling

 

Filtering-and-Updating (FUS) Strategy

85 of 102

85

Hard poisoning samples are more important than easy ones.

Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy. arXiv 2023.

 

 

poisoning sample

benign sample

Learnable Poisoning Sample Selection Strategy (LPS) Strategy

86 of 102

86

Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy. arXiv 2023.

Poisoned samples

Benign samples

 

 

Outer optimization

Inner optimization

LPS Strategy: min-max optimization

 

 

For each step:

 

 

Learnable Poisoning Sample Selection Strategy (LPS) Strategy

87 of 102

87

One-stage training & Two-stage training

88 of 102

88

 

 

One-stage training & Two-stage training

89 of 102

89

The overview of Imperceptible Backdoor Attack

Imperceptible backdoor attack: From input space to feature representation, IJCAI 2022

Optimize together

Imperceptible Backdoor Attack

90 of 102

90

Training process of Lira

Lira: Learnable, imperceptible and robust backdoor attacks

Lira: Learnable, imperceptible and robust backdoor attacks. ICCV 2021.

f: classification model

T: trigger generater

  • The trigger T and the classifier f are optimized iteratively.

91 of 102

91

Full access of traning data & Partial access of training data

92 of 102

92

The attacker has full access to training data, such that any training data could be manipulated.

The attacker has no access to all the data. For example, in the scenario of distributed learning or federated learning (FL), each client can only access partial training data.

Full access of traning data & Partial access of training data

93 of 102

93

Backdoor with FL

Backdoors can be inserted into FL models, but they do not remain in the model after the attacker stops uploading poisoned updates.

Neurotoxin: Durable backdoors in federated learning. ICML 2022

Neurotoxin

94 of 102

94

Clients

Server

Upload parameter

Parameter distribution

Adversarial

Clients

Upload parameter

Parameter distribution

Neurotoxin: Durable backdoors in federated learning. ICML 2022

Gradient

Neuron idx

The important neuron for clean feature

Neuron idx

Filter

Upload gradient

Neurotoxin

95 of 102

95

Full control of traning & Partial control of training

96 of 102

96

The attacker has the chance to fully control the training process.

Sometimes the training process is separated into several stages by different trainers. Consequently, the attacker can only control a partial training process.

Full control of traning & Partial control of training

97 of 102

97

Each layer adds a loss

pre-training (under control)

fine-tuning (out of control)

Backdoor attacks on pre-trained models by layerwise weight poisoning. EMNLP 2021

Shallow backdoor neurons are difficult to eliminate

Shallow backdoor neurons are easy to eliminate

LWP: Layer Weight-Poison

98 of 102

98

PPT: Poisoned Prompt Tuning

The overview of PPT

PPT: Backdoor Attacks on Pre-trained Models via Poisoned Prompt Tuning, IJCAI 2022

poisoned text

prompt

original text

99 of 102

99

Different components of the training procedure

100 of 102

100

Backdoor Attack Mode

  • Clean mode: The network is encouraged to correctly recognize clean images
  • Attack mode: The attack should be activated on poisoned data.
  • Cross-trigger mode: Poison data with other triggers/noise should not be activated

training algorithm

training loss

training order

Classification loss under cross-trigger mode:

Input-Aware dynamic backdoor attack. NeurIPS 2020

Input-Aware

101 of 102

101

Lira: Learnable, imperceptible and robust backdoor attacks

Lira: Learnable, imperceptible and robust backdoor attacks. ICCV 2021

The objective function:

training algorithm

training loss

training order

102 of 102

102

Overview of BOB

training algorithm

training loss

training order

Manipulating SGD with Data Ordering Attacks. NeurIPS 2021

The surrogate model trained on poisoned/adversarial samples provides a gradient guidance

BOB: Batch-Order Backdoor

Manipulating the batch order to approximate the guided gradient