Attention Knows Whom to Trust: Attention-based Trust Management for LLM Multi-Agent Systems

TOP Literature Database Attention Knows Whom to Trust: Attention-based Trust Management for LLM Multi-Agent Systems

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2506.02546

PDF

https://arxiv.org/pdf/2506.02546

Paper Information

Author: Pengfei He,Zhenwei Dai,Xianfeng Tang,Yue Xing,Hui Liu,Jingying Zeng,Qiankun Peng,Shrivats Agrawal,Samarth Varshney,Suhang Wang,Jiliang Tang,Qi He
Published: 6-3-2025
Affiliation: Michigan State University
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Ethical Considerations Model DoS Indirect Prompt Injection

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large Language Model-based Multi-Agent Systems (LLM-MAS) have demonstrated strong capabilities in solving complex tasks but remain vulnerable when agents receive unreliable messages. This vulnerability stems from a fundamental gap: LLM agents treat all incoming messages equally without evaluating their trustworthiness. While some existing studies approach the trustworthiness, they focus on a single type of harmfulness rather than analyze it in a holistic approach from multiple trustworthiness perspectives. In this work, we propose Attention Trust Score (A-Trust), a lightweight, attention-based method for evaluating message trustworthiness. Inspired by human communication literature[1], through systematically analyzing attention behaviors across six orthogonal trust dimensions, we find that certain attention heads in the LLM specialize in detecting specific types of violations. Leveraging these insights, A-Trust directly infers trustworthiness from internal attention patterns without requiring external prompts or verifiers. Building upon A-Trust, we develop a principled and efficient trust management system (TMS) for LLM-MAS, enabling both message-level and agent-level trust assessment. Experiments across diverse multi-agent settings and tasks demonstrate that applying our TMS significantly enhances robustness against malicious inputs.

External Datasets

Trust Violation

MMLU

StratagyQA

MATH-500

MBPP