GAMMAF: A Common Framework for Graph-Based Anomaly Monitoring Benchmarking in LLM Multi-Agent Systems

Enkelejda Kasneci, Kathrin Seßler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüller-meier, et al.

Published: 2023

Efficient memory management for large language model serving with pagedattention

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica

Published: 2023

arxiv

Cited by 1

Computing Research Repository (CoRR)

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

Donghyun Lee, Mo Tiwari

Published: 10.9.2024

As Large Language Models (LLMs) grow increasingly powerful, multi-agent systems are becoming more prevalent in modern AI applications. Most safety research, however, has focused on vulnerabilities in single-agent LLMs. These include prompt injection attacks, where malicious prompts embedded in external content trick the LLM into executing unintended or harmful actions, compromising the victim's application. In this paper, we reveal a more dangerous vector: LLM-to-LLM prompt injection within multi-agent systems. We introduce Prompt Infection, a novel attack where malicious prompts self-replicate across interconnected agents, behaving much like a computer virus. This attack poses severe threats, including data theft, scams, misinformation, and system-wide disruption, all while propagating silently through the system. Our extensive experiments demonstrate that multi-agent systems are highly susceptible, even when agents do not publicly share all communications. To address this, we propose LLM Tagging, a defense mechanism that, when combined with existing safeguards, significantly mitigates infection spread. This work underscores the urgent need for advanced security measures as multi-agent LLM systems become more widely adopted.

Prompt Injection Attack Method Defense Method

Advances in Neural Information Processing Systems

Camel: Communicative agents for” mind” exploration of large language model society

Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, Bernard Ghanem

Published: 2023

The Twelfth International Conference on Learning Representations

Gaia: a benchmark for general ai assistants

Mialon, G., Fourrier, C., Wolf, T., LeCun, Y., Scialom, T.

Published: 2023

Blindguard: Safeguarding llm-based multi-agent systems under unknown attacks

Miao, R., Liu, Y., Wang, Y., Shen, X., Tan, Y., Dai, Y., Pan, S., Wang, X.

Published: 2025

gpt-oss-120b & gpt-oss-20b model card

OpenAI, Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K. Arora, Yu Bai, Bowen Baker, Haiming Bao, Boaz Barak, Ally Bennett, Tyler Bertao, Nivedita Brett, Eugene Brevdo, Greg Brockman, Sebastien Bubeck

Published: 2025

Explainable and fine-grained safeguarding of llm multi-agent systems via bi-level graph anomaly detection

Pan, J., Liu, Y., Miao, R., Ding, K., Zheng, Y., Nguyen, Q. V. H., Liew, A. W.-C., Pan, S.

Published: 2025

arXiv

Communicative agents for software development

Chen Qian, Xin Cong, Cheng Yang, Weize Chen, Yusheng Su, Juyuan Xu, Zhiyuan Liu, Maosong Sun

Published: 2023

The Thirteenth International Conference on Learning Representations

Scaling large language model-based multi-agent collaboration

Qian, C., Xie, Z., Wang, Y., Liu, W., Zhu, K., Xia, H., Dang, Y., Du, Z., Chen, W., Yang, C., Liu, Z., Sun, M.

Published: 2025

Scientific Reports

Industrial applications of large language models

Raza, M., Jahangir, Z., Riaz, M. B., Saeed, M. J., Sattar, M. A.

Published: 2025

Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems

Raza, S., Sapkota, R., Karkee, M., Emmanouilidis, C.

Published: 2025

Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails

T. Rebedea, R. Dinu, M. Sreedhar, C. Parisien, J. Cohen

Published: 2023

Multi-agent collaboration: Harnessing the power of intelligent llm agents

Y. Talebirad, A. Nadiri

Published: 2023

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

CommonsenseQA: A question answering challenge targeting commonsense knowledge

Alon Talmor, Jonathan Herzig, Nicholas Lourie, Jonathan Berant

Published: 2019

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

G-safeguard: A topology-guided security lens and treatment on llm-based multi-agent systems

Wang, S., Zhang, G., Yu, M., Wan, G., Meng, F., Guo, C., Wang, K., Wang, Y.

Published: 2025

Advances in Neural Information Processing Systems 38

Mmlu-pro: A more robust and challenging multi-task language understanding benchmark

Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, Wenhu Chen

Published: 2024

First Conference on Language Modeling

Autogen: Enabling next-gen llm applications via multi-agent conversations

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu

Published: 2024

Science China Information Sciences

The rise and potential of large language model based agents: A survey

Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Zhang, B., Liao, Y., Shang, C., Cui, J., Xu, Y., Wen, X., Zheng, T., Zhou, W., Zhao, H., Gui, T., Zhang, Q., Huang, X.

Published: 2025

ICML 2025 workshop on computer use agents

Guardagent: safeguard llm agents via knowledge-enabled reasoning

Xiang, Z., Zheng, L., Li, Y., Hong, J., Li, Q., Xie, H., Zhang, J., Xiong, Z., Xie, C., Bastian, N. D.

Published: 2025

Who’s the mole? modeling and detecting intention-hiding malicious agents in llm-based multi-agent systems

Xie, Y., Zhu, C., Zhang, X., Zhu, T., Ye, D., Wang, M., Liu, C.

Published: 2025

arxiv

Cited by 1

Computing Research Repository (CoRR)

Attack the Messages, Not the Agents: A Multi-round Adaptive Stealthy Tampering Framework for LLM-MAS

Bingyu Yan, Ziyi Zhou, Xiaoming Zhang, Chaozhuo Li, Ruilin Zeng, Yirui Qi, Tianbo Wang, Litian Zhang

Published: 8.5.2025

Large language model-based multi-agent systems (LLM-MAS) effectively accomplish complex and dynamic tasks through inter-agent communication, but this reliance introduces substantial safety vulnerabilities. Existing attack methods targeting LLM-MAS either compromise agent internals or rely on direct and overt persuasion, which limit their effectiveness, adaptability, and stealthiness. In this paper, we propose MAST, a Multi-round Adaptive Stealthy Tampering framework designed to exploit communication vulnerabilities within the system. MAST integrates Monte Carlo Tree Search with Direct Preference Optimization to train an attack policy model that adaptively generates effective multi-round tampering strategies. Furthermore, to preserve stealthiness, we impose dual semantic and embedding similarity constraints during the tampering process. Comprehensive experiments across diverse tasks, communication architectures, and LLMs demonstrate that MAST consistently achieves high attack success rates while significantly enhancing stealthiness compared to baselines. These findings highlight the effectiveness, stealthiness, and adaptability of MAST, underscoring the need for robust communication safeguards in LLM-MAS.

Attack Action Model Watermark Reinforcement Learning Attack

Health Care Science

Large language models in health care: Development, applications, and challenges

Yang, R., Tan, T. F., Lu, W., Thirunavukarasu, A. J., Ting, D. S. W., Liu, N.

Published: 2023

The Eleventh International Conference on Learning Representations

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, Yuan Cao

Published: 2022

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2

A survey on trustworthy llm agents: Threats and countermeasures

Yu, M., Meng, F., Zhou, X., Wang, S., Mao, J., Pan, L., Chen, T., Wang, K., Li, X., Zhang, Y.

Published: 2025

NetSafe: Exploring the Topological Safety of Multi-agent Networks

Yu, M., Wang, S., Zhang, G., Mao, J., Yin, C., Liu, Q., Wen, Q., Wang, K., Wang, Y.

Published: 2024

arXiv preprint

Autodefense: Multi-agent llm defense against jailbreak attacks

Zeng, Y., Wu, Y., Zhang, X., Wang, H., Wu, Q.

Published: 2024

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Breaking agents: Compromising autonomous llm agents through malfunction amplification

Zhang, B., Tan, Y., Shen, Y., Salem, A., Backes, M., Zannettou, S., Zhang, Y.

Published: 2025

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

From allies to adversaries: Manipulating llm tool-calling through adversarial injection

Zhang, R., Wang, H., Wang, J., Li, M., Huang, Y., Wang, D., Wang, Q.

Published: 2025

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

From allies to adversaries: Manipulating llm tool-calling through adversarial injection

Zhang, R., Wang, H., Wang, J., Li, M., Huang, Y., Wang, D., Wang, Q.

Published: 2025

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Made: Malicious agent detection for robust multi-agent collaborative perception

Zhao, Y., Xiang, Z., Yin, S., Pang, X., Wang, Y., Chen, S.

Published: 2024

Guardian: Safeguarding llm multi-agent collaborations with temporal graph modeling

Zhou, J., Wang, L., Yang, X.

Published: 2025

Corba: Contagious recursive blocking attacks on multi-agent systems based on large language models

Zhou, Z., Li, Z., Zhang, J., Zhang, Y., Wang, K., Liu, Y., Guo, Q.

Published: 2025

Forty-first International Conference on Machine Learning

Gptswarm: Language agents as optimizable graphs

Zhuge, M., Wang, W., Kirsch, L., Faccio, F., Khizbullin, D., Schmidhuber, J.

Published: 2024