Erfan
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
by
Erfan Shayegani, Yue Dong, Nael Abu-Ghazaleh
Best Paper Award:
Spotlight Presentation: ICLR 2024
Erfan
Safety Alignment of LLMs: Too simple that cannot generalize
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks - Shayegani et al 2023
Erfan
Safety Alignment of LLMs: Too simple that cannot generalize
Multi-Lingual capabilities
Encoding capabilities
unknown capabilities 💀
Erfan
Safety Alignment of Multi-Modal Models needs to be "Cross-Modal"
Erfan
Jumping over the Textual gate of alignment!
Erfan
Very high success rate for the cross-modal attack!
Erfan
Our optimization algorithm to hide malicious images:
Erfan
Thank you very much!
Link to the paper: https://openreview.net/forum?id=plmBsXHxgR
My website:
https://erfanshayegani.github.io/
Don't hesitate to contact me! Would be very happy to discuss! 😄