1 of 1

Background

This work has proposed a sparse attack that yields interpretable adversarial examples, thanks to their superb sparsity.
Our key idea is to approximate the NP-hard sparsity optimization problem via a theoretical sound reparameterization technique. This makes direct optimization of sparse perturbations tractable.
Our approach outperforms sparse attack counterparts in terms of computation efficiency, transferability, and attack intensity.
Our approach helps reveal two types of adversarial perturbations. This empowers us to interpret how adversarial examples mislead DNNs into incorrect decisions.

Conclusion

Towards Interpretable Adversarial Examples

via Sparse Adversarial Attack

¹University of Delaware, ²Stevens Institute of Technology, ³University of West Florida

Fudong Lin¹, Jiadong Lou¹, Hao Wang², Brian Jalaian³, Xu Yuan¹

Code

Paper

Dense Attacks: Difficult to interpret adversarial attacks due to their overly perturbed adversarial examples
Sparse Attacks: NP-hard problem, with their resultant adversarial examples suffering from poor sparsity
Our Goal: Develop a sparse attack that yields interpretable adversarial examples, allowing us to interpret the vulnerability of Deep Neural Networks (DNNs)

Motivation

Interpretable Adversarial Attacks

Problem Statement

Explainable AI: Grad-CAM and Guided Grad-CAM
Two Types of Malicious Noises: “obscuring noise”, which prevents DNNs from identify true labels; and “leading noise”, which leads DNNs into incorrect decisions

Interpret the Vulnerability of DNNs

Clean Image

Malicious Noise

Adversarial Example

Intractable Sparse Optimization

Intractable!

Reparameterization

Approximation of Sparse Optimization

Non-differentiable!

Misclassify “candle” as “toilet tissue”

Misclassify “canoe” as “wings”

Our approach achieves the best attack successful rates when attacking robust models trained by PDG-AT or Fast-AT.

Experimental Results

Comparison to Sparse Attack Counterparts

Adversarial Attacks