SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings

TOP Literature Database SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2502.12562

PDF

https://arxiv.org/pdf/2502.12562

Paper Information

Author: Weikai Lu,Hao Peng,Huiping Zhuang,Cen Chen,Ziqian Zeng
Published: 2-18-2025
Updated: 5-22-2025
Affiliation: Shien-Ming Wu School of Intelligent Engineering, South China University of Technology
Country: China
Conference: Annual Meeting of the Association for Computational Linguistics (ACL)

Labels Estimated by AI

Alignment Prompt Injection Text Generation Method

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Multimodal Large Language Models (MLLMs) have serious security vulnerabilities.While safety alignment using multimodal datasets consisting of text and data of additional modalities can effectively enhance MLLM's security, it is costly to construct these datasets. Existing low-resource security alignment methods, including textual alignment, have been found to struggle with the security risks posed by additional modalities. To address this, we propose Synthetic Embedding augmented safety Alignment (SEA), which optimizes embeddings of additional modality through gradient updates to expand textual datasets. This enables multimodal safety alignment training even when only textual data is available. Extensive experiments on image, video, and audio-based MLLMs demonstrate that SEA can synthesize a high-quality embedding on a single RTX3090 GPU within 24 seconds. SEA significantly improves the security of MLLMs when faced with threats from additional modalities. To assess the security risks introduced by video and audio, we also introduced a new benchmark called VA-SafetyBench. High attack success rates across multiple MLLMs validate its challenge. Our code and data will be available at https://github.com/ZeroNLP/SEA.

External Datasets

SafeRLHF

MM-SafetyBench

VA-SafetyBench