1 of 8

Erfan

Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

by

Erfan Shayegani, Yue Dong, Nael Abu-Ghazaleh

Best Paper Award:

Spotlight Presentation: ICLR 2024

2 of 8

Erfan

Safety Alignment of LLMs: Too simple that cannot generalize

3 of 8

Erfan

Safety Alignment of LLMs: Too simple that cannot generalize

Multi-Lingual capabilities

Encoding capabilities

unknown capabilities 💀

4 of 8

Erfan

Safety Alignment of Multi-Modal Models needs to be "Cross-Modal"

5 of 8

Erfan

Jumping over the Textual gate of alignment!

6 of 8

Erfan

Very high success rate for the cross-modal attack!

7 of 8

Erfan

Our optimization algorithm to hide malicious images:

8 of 8

Erfan

Thank you very much!

Link to the paper: https://openreview.net/forum?id=plmBsXHxgR

My website:

https://erfanshayegani.github.io/

Don't hesitate to contact me! Would be very happy to discuss! 😄