Paper Information
- Author
- Jiongchi Yu,Xiaofei Xie,Qiang Hu,Yuhan Ma,Ziming Zhao
- Published
- 8-11-2025
- Updated
- 8-12-2025
- Affiliation
- Singapore Management University
- Country
- Singapore
- Conference
- Computing Research Repository (CoRR)
Abstract
Insider threats, which can lead to severe losses, remain a major security
concern. While machine learning-based insider threat detection (ITD) methods
have shown promising results, their progress is hindered by the scarcity of
high-quality data. Enterprise data is sensitive and rarely accessible, while
publicly available datasets, when limited in scale due to cost, lack sufficient
real-world coverage; and when purely synthetic, they fail to capture rich
semantics and realistic user behavior. To address this, we propose Chimera, the
first large language model (LLM)-based multi-agent framework that automatically
simulates both benign and malicious insider activities and collects diverse
logs across diverse enterprise environments. Chimera models each employee with
agents that have role-specific behavior and integrates modules for group
meetings, pairwise interactions, and autonomous scheduling, capturing realistic
organizational dynamics. It incorporates 15 types of insider attacks (e.g., IP
theft, system sabotage) and has been deployed to simulate activities in three
sensitive domains: technology company, finance corporation, and medical
institution, producing a new dataset, ChimeraLog. We assess ChimeraLog via
human studies and quantitative analysis, confirming its diversity, realism, and
presence of explainable threat patterns. Evaluations of existing ITD methods
show an average F1-score of 0.83, which is significantly lower than 0.99 on the
CERT dataset, demonstrating ChimeraLog's higher difficulty and utility for
advancing ITD research.