AIセキュリティポータル K Program
Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights
Share
Abstract
Large Language Models(LLMs) are increasingly explored for cybersecurity applications such as vulnerability detection. In the domain of threat modelling, prior work has primarily evaluated a number of general-purpose Large Language Models under limited prompting settings. In this study, we extend the research area of structured threat modelling by systematically evaluating domain-adapted language models of different sizes to their general counterparts. We use both LLMs and Small Language Models(SLMs) that were domain adapted to telecommunications and cybersecuirty. For the structured threat modelling, we selected the widely used STRIDE approach and the application area is 5G security. We present a comprehensive empirical evaluation using 52 different configurations (on 8 different language models) to analyze the impact of 1) domain adaptation, 2) model scale, 3) decoding strategies (greedy vs. stochastic sampling), and 4) prompting technique on STRIDE threat classification. Our results show that domain-adapted models do not consistently outperform their general-purpose counterparts, and decoding strategies significantly affect model behavior and output validity. They also show that while larger models generally achieve higher performance, these gains are neither consistent nor sufficient for reliable threat modelling. These findings highlight fundamental limitations of current LLMs for structured threat modelling tasks and suggest that improvements require more than additional training data or model scaling, motivating the need for incorporating more task-specific reasoning and stronger grounding in security concepts. We present insights on invalid outputs encountered and present suggestions for prompting tailored specifically for STRIDE threat modelling.
Large language model (llm) for telecommunications: A comprehensive survey on principles, key techniques, and opportunities
H. Zhou, C. Hu, Y. Yuan, Y. Cui, Y. Jin, C. Chen, H. Wu, D. Yuan, L. Jiang, D. Wu
Published: 2024
Large Language Models for Cyber Security: A Systematic Literature Review
Hanxiang Xu, Shenao Wang, Ningke Li, Kailong Wang, Yanjie Zhao, Kai Chen, Ting Yu, Yang Liu, Haoyu Wang
Published: 5.8.2024
Observations on llms for telecom domain: capabilities and limitations
S. Soman, R. HG
Published: 2023
Towards Explainable Network Intrusion Detection using Large Language Models
Paul R. B. Houssel, Priyanka Singh, Siamak Layeghy, Marius Portmann
Published: 8.8.2024
From large to mammoth: A comparative evaluation of large language models in vulnerability detection
Jie Lin, David Mohaisen
Published: 2025
Telecomgpt: A framework to build telecom-specific large language models
H. Zou, Q. Zhao, Y. Tian, L. Bariah, F. Bader, T. Lestable, M. Debbah
Published: 2025
A survey on small language models
C. Van Nguyen, X. Shen, R. Aponte, Y. Xia, S. Basu, Z. Hu, J. Chen, M. Parmar, S. Kunapuli, J. Barrow
Published: 2025
Security analysis of critical 5g interfaces
M. Mahyoub
Published: 2024
Teleqna: A benchmark dataset to assess large language models telecommunications knowledge
A. Maatouk, F. Ayed, N. Piovesan, A. De Domenico, M. Debbah, Z.-Q. Luo
Published: 2025
Security architecture and procedures for 5G system
3rd Generation Partnership Project (3GPP)
Published: 2025
Security Assurance Specification (SCAS) threats and critical assets in 3GPP network product classes
3rd Generation Partnership Project (3GPP)
Published: 2025
Study on Security for Next Radio (NR) Integrated Access and Backhaul (IAB) (Release 17)
3rd Generation Partnership Project (3GPP)
Published: 2022
Exploring the impact of temperature on large language models: Hot or cold?
L. Li, L. Sleem, G. Nichil, R. State
Published: 2025
The curious case of neural text degeneration
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, Yejin Choi
Published: 2019
Hugging Face – The AI community building the future
Published: 2024
Language models are few-shot learners
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei
Published: 2020
Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting
M. Turpin, J. Michael, E. Perez, S. Bowman
Published: 2023
Better zero-shot reasoning with role-play prompting
A. Kong, S. Zhao, H. Chen, Q. Li, Y. Qin, R. Sun, X. Zhou, E. Wang, X. Dong
Published: 2024
Share