AIセキュリティポータル K Program
GAMMAF: A Common Framework for Graph-Based Anomaly Monitoring Benchmarking in LLM Multi-Agent Systems
Share
Abstract
The rapid integration of Large Language Models (LLMs) into Multi-Agent Systems (MAS) has significantly enhanced their collaborative problem-solving capabilities, but it has also expanded their attack surfaces, exposing them to vulnerabilities such as prompt infection and compromised inter-agent communication. While emerging graph-based anomaly detection methods show promise in protecting these networks, the field currently lacks a standardized, reproducible environment to train these models and evaluate their efficacy. To address this gap, we introduce Gammaf (Graph-based Anomaly Monitoring for LLM Multi-Agent systems Framework), an open-source benchmarking platform. Gammaf is not a novel defense mechanism itself, but rather a comprehensive evaluation architecture designed to generate synthetic multi-agent interaction datasets and benchmark the performance of existing and future defense models. The proposed framework operates through two interdependent pipelines: a Training Data Generation stage, which simulates debates across varied network topologies to capture interactions as robust attributed graphs, and a Defense System Benchmarking stage, which actively evaluates defense models by dynamically isolating flagged adversarial nodes during live inference rounds. Through rigorous evaluation using established defense baselines (XG-Guard and BlindGuard) across multiple knowledge tasks (such as MMLU-Pro and GSM8K), we demonstrate Gammaf's high utility, topological scalability, and execution efficiency. Furthermore, our experimental results reveal that equipping an LLM-MAS with effective attack remediation not only recovers system integrity but also substantially reduces overall operational costs by facilitating early consensus and cutting off the extensive token generation typical of adversarial agents.
Model Context Protocol (MCP) Specification
Anthropic
Published: 2024
Phantom: General Backdoor Attacks on Retrieval Augmented Language Generation
Harsh Chaudhari, Giorgio Severi, John Abascal, Anshuman Suri, Matthew Jagielski, Christopher A. Choquette-Choo, Milad Nasr, Cristina Nita-Rotaru, Alina Oprea
Published: 5.31.2024
Internet of agents: Weaving a web of heterogeneous agents for collaborative intelligence
Chen, W., You, Z., Li, R., Guan, Y., Qian, C., Zhao, C., Yang, C., Xie, R., Liu, Z., Sun, M.
Published: 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, Jing Jiang, Min Lin
Published: 2.14.2024
Attention Knows Whom to Trust: Attention-based Trust Management for LLM Multi-Agent Systems
Pengfei He, Zhenwei Dai, Xianfeng Tang, Yue Xing, Hui Liu, Jingying Zeng, Qiankun Peng, Shrivats Agrawal, Samarth Varshney, Suhang Wang, Jiliang Tang, Qi He
Published: 6.3.2025
Attention Knows Whom to Trust: Attention-based Trust Management for LLM Multi-Agent Systems
Pengfei He, Zhenwei Dai, Xianfeng Tang, Yue Xing, Hui Liu, Jingying Zeng, Qiankun Peng, Shrivats Agrawal, Samarth Varshney, Suhang Wang, Jiliang Tang, Qi He
Published: 6.3.2025
Red-Teaming LLM Multi-Agent Systems via Communication Attacks
Pengfei He, Yupin Lin, Shen Dong, Han Xu, Yue Xing, Hui Liu
Published: 2.21.2025
Sentinelagent: Graph-based anomaly detection in multi-agent systems
He, X., Wu, D., Zhai, Y., Sun, K.
Published: 2025
Measuring massive multitask language understanding
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., Steinhardt, J.
Published: 2021
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, Madian Khabsa
Published: 12.8.2023
Chatgpt for good? on opportunities and challenges of large language models for education
Enkelejda Kasneci, Kathrin Seßler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüller-meier, et al.
Published: 2023
Efficient memory management for large language model serving with pagedattention
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica
Published: 2023
Camel: Communicative agents for” mind” exploration of large language model society
Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, Bernard Ghanem
Published: 2023
Gaia: a benchmark for general ai assistants
Mialon, G., Fourrier, C., Wolf, T., LeCun, Y., Scialom, T.
Published: 2023
Blindguard: Safeguarding llm-based multi-agent systems under unknown attacks
Miao, R., Liu, Y., Wang, Y., Shen, X., Tan, Y., Dai, Y., Pan, S., Wang, X.
Published: 2025
Explainable and fine-grained safeguarding of llm multi-agent systems via bi-level graph anomaly detection
Pan, J., Liu, Y., Miao, R., Ding, K., Zheng, Y., Nguyen, Q. V. H., Liew, A. W.-C., Pan, S.
Published: 2025
Scaling large language model-based multi-agent collaboration
Qian, C., Xie, Z., Wang, Y., Liu, W., Zhu, K., Xia, H., Dang, Y., Du, Z., Chen, W., Yang, C., Liu, Z., Sun, M.
Published: 2025
Industrial applications of large language models
Raza, M., Jahangir, Z., Riaz, M. B., Saeed, M. J., Sattar, M. A.
Published: 2025
Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems
Raza, S., Sapkota, R., Karkee, M., Emmanouilidis, C.
Published: 2025
Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails
T. Rebedea, R. Dinu, M. Sreedhar, C. Parisien, J. Cohen
Published: 2023
CommonsenseQA: A question answering challenge targeting commonsense knowledge
Alon Talmor, Jonathan Herzig, Nicholas Lourie, Jonathan Berant
Published: 2019
G-safeguard: A topology-guided security lens and treatment on llm-based multi-agent systems
Wang, S., Zhang, G., Yu, M., Wan, G., Meng, F., Guo, C., Wang, K., Wang, Y.
Published: 2025
Autogen: Enabling next-gen llm applications via multi-agent conversations
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu
Published: 2024
The rise and potential of large language model based agents: A survey
Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Zhang, B., Liao, Y., Shang, C., Cui, J., Xu, Y., Wen, X., Zheng, T., Zhou, W., Zhao, H., Gui, T., Zhang, Q., Huang, X.
Published: 2025
Guardagent: safeguard llm agents via knowledge-enabled reasoning
Xiang, Z., Zheng, L., Li, Y., Hong, J., Li, Q., Xie, H., Zhang, J., Xiong, Z., Xie, C., Bastian, N. D.
Published: 2025
Who’s the mole? modeling and detecting intention-hiding malicious agents in llm-based multi-agent systems
Xie, Y., Zhu, C., Zhang, X., Zhu, T., Ye, D., Wang, M., Liu, C.
Published: 2025
Attack the Messages, Not the Agents: A Multi-round Adaptive Stealthy Tampering Framework for LLM-MAS
Bingyu Yan, Ziyi Zhou, Xiaoming Zhang, Chaozhuo Li, Ruilin Zeng, Yirui Qi, Tianbo Wang, Litian Zhang
Published: 8.5.2025
Large language models in health care: Development, applications, and challenges
Yang, R., Tan, T. F., Lu, W., Thirunavukarasu, A. J., Ting, D. S. W., Liu, N.
Published: 2023
React: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, Yuan Cao
Published: 2022
A survey on trustworthy llm agents: Threats and countermeasures
Yu, M., Meng, F., Zhou, X., Wang, S., Mao, J., Pan, L., Chen, T., Wang, K., Li, X., Zhang, Y.
Published: 2025
NetSafe: Exploring the Topological Safety of Multi-agent Networks
Yu, M., Wang, S., Zhang, G., Mao, J., Yin, C., Liu, Q., Wen, Q., Wang, K., Wang, Y.
Published: 2024
Breaking agents: Compromising autonomous llm agents through malfunction amplification
Zhang, B., Tan, Y., Shen, Y., Salem, A., Backes, M., Zannettou, S., Zhang, Y.
Published: 2025
From allies to adversaries: Manipulating llm tool-calling through adversarial injection
Zhang, R., Wang, H., Wang, J., Li, M., Huang, Y., Wang, D., Wang, Q.
Published: 2025
From allies to adversaries: Manipulating llm tool-calling through adversarial injection
Zhang, R., Wang, H., Wang, J., Li, M., Huang, Y., Wang, D., Wang, Q.
Published: 2025
Made: Malicious agent detection for robust multi-agent collaborative perception
Zhao, Y., Xiang, Z., Yin, S., Pang, X., Wang, Y., Chen, S.
Published: 2024
Guardian: Safeguarding llm multi-agent collaborations with temporal graph modeling
Zhou, J., Wang, L., Yang, X.
Published: 2025
Corba: Contagious recursive blocking attacks on multi-agent systems based on large language models
Zhou, Z., Li, Z., Zhang, J., Zhang, Y., Wang, K., Liu, Y., Guo, Q.
Published: 2025
Gptswarm: Language agents as optimizable graphs
Zhuge, M., Wang, W., Kirsch, L., Faccio, F., Khizbullin, D., Schmidhuber, J.
Published: 2024
Share