These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Due to their unprecedented ability to process and respond to various types of
data, Multimodal Large Language Models (MLLMs) are constantly defining the new
boundary of Artificial General Intelligence (AGI). As these advanced generative
models increasingly form collaborative networks for complex tasks, the
integrity and security of these systems are crucial. Our paper, ``The Wolf
Within'', explores a novel vulnerability in MLLM societies - the indirect
propagation of malicious content. Unlike direct harmful output generation for
MLLMs, our research demonstrates how a single MLLM agent can be subtly
influenced to generate prompts that, in turn, induce other MLLM agents in the
society to output malicious content. Our findings reveal that, an MLLM agent,
when manipulated to produce specific prompts or instructions, can effectively
``infect'' other agents within a society of MLLMs. This infection leads to the
generation and circulation of harmful outputs, such as dangerous instructions
or misinformation, across the society. We also show the transferability of
these indirectly generated prompts, highlighting their possibility in
propagating malice through inter-agent communication. This research provides a
critical insight into a new dimension of threat posed by MLLMs, where a single
agent can act as a catalyst for widespread malevolent influence. Our work
underscores the urgent need for developing robust mechanisms to detect and
mitigate such covert manipulations within MLLM societies, ensuring their safe
and ethical utilization in societal applications.