These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Large Language Model-based Multi-Agent Systems (LLM-MAS) have demonstrated
strong capabilities in solving complex tasks but remain vulnerable when agents
receive unreliable messages. This vulnerability stems from a fundamental gap:
LLM agents treat all incoming messages equally without evaluating their
trustworthiness. While some existing studies approach the trustworthiness, they
focus on a single type of harmfulness rather than analyze it in a holistic
approach from multiple trustworthiness perspectives. In this work, we propose
Attention Trust Score (A-Trust), a lightweight, attention-based method for
evaluating message trustworthiness. Inspired by human communication
literature[1], through systematically analyzing attention behaviors across six
orthogonal trust dimensions, we find that certain attention heads in the LLM
specialize in detecting specific types of violations. Leveraging these
insights, A-Trust directly infers trustworthiness from internal attention
patterns without requiring external prompts or verifiers. Building upon
A-Trust, we develop a principled and efficient trust management system (TMS)
for LLM-MAS, enabling both message-level and agent-level trust assessment.
Experiments across diverse multi-agent settings and tasks demonstrate that
applying our TMS significantly enhances robustness against malicious inputs.