These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Insider threats are a growing organizational problem due to the complexity of
identifying their technical and behavioral elements. A large research body is
dedicated to the study of insider threats from technological, psychological,
and educational perspectives. However, research in this domain has been
generally dependent on datasets that are static and limited access which
restricts the development of adaptive detection models. This study introduces a
novel, ethically grounded approach that uses the large language model (LLM)
Claude Sonnet 3.7 to dynamically synthesize syslog messages, some of which
contain indicators of insider threat scenarios. The messages reflect real-world
data distributions by being highly imbalanced (1% insider threats). The syslogs
were analyzed for insider threats by both Sonnet 3.7 and GPT-4o, with their
performance evaluated through statistical metrics including accuracy,
precision, recall, F1, specificity, FAR, MCC, and ROC AUC. Sonnet 3.7
consistently outperformed GPT-4o across nearly all metrics, particularly in
reducing false alarms and improving detection accuracy. The results show strong
promise for the use of LLMs in synthetic dataset generation and insider threat
detection.