AIセキュリティポータル K Program
Unleashing the Power of LLM to Infer State Machine from the Protocol Implementation
Share
Abstract
State machines are essential for enhancing protocol analysis to identify vulnerabilities. However, inferring state machines from network protocol implementations is challenging due to complex code syntax and semantics. Traditional dynamic analysis methods often miss critical state transitions due to limited coverage, while static analysis faces path explosion issues. To overcome these challenges, we introduce a novel state machine inference approach utilizing Large Language Models (LLMs), named ProtocolGPT. This method employs retrieval augmented generation technology to enhance a pre-trained model with specific knowledge from protocol implementations. Through effective prompt engineering, we accurately identify and infer state machines. To the best of our knowledge, our approach represents the first state machine inference that leverages the source code of protocol implementations. Our evaluation of six protocol implementations shows that our method achieves a precision of over 90%, outperforming the baselines by more than 30%. Furthermore, integrating our approach with protocol fuzzing improves coverage by more than 20% and uncovers two 0-day vulnerabilities compared to baseline methods.
Aflnet: a greybox fuzzer for network protocols
V.-T. Pham, M. Bohme, A. Roychoudhury
Published: 2020
Analysis of DTLS implementations using protocol state fuzzing
P. Fiterau-Brostean, B. Jonsson, R. Merget, J. De Ruiter, K. Sagonas, J. Somorovsky
Published: 2020
Verified models and reference implementations for the tls 1.3 standard candidate
K. Bhargavan, B. Blanchet, N. Kobeissi
Published: 2017
Stateful greybox fuzzing
J. Ba, M. Bohme, Z. Mirzamomen, A. Roychoudhury
Published: 2022
Hermes: Unlocking security analysis of cellular network protocols by synthesizing finite state machines from natural language specifications
A. A. Ishtiaq, S. M. M. R. Sarkar Snigdha Sarathi Das, K. T. Ali Ranjbar, Z. S. Tianwei Wu, M. A. Weixuan Wang, S. R. H. Rui Zhang
Published: 2024
Automated attack synthesis by extracting finite state machines from protocol specification documents
M. L. Pacheco, M. von Hippel, B. Weintraub, D. Goldwasser, C. Nita-Rotaru
Published: 2022
Extracting protocol format as state machine via controlled static loop analysis
Q. Shi, X. Xu, X. Zhang
Published: 2023
NEMESYS: Network message syntax reverse engineering by analysis of the intrinsic structure of individual messages
S. Kleber, H. Kopp, F. Kargl
Published: 2018
Netplier: Probabilistic network protocol reverse engineering from message traces
Y. Ye, Z. Zhang, F. Wang, X. Zhang, D. Xu
Published: 2021
Message type identification of binary network protocols using continuous segment similarity
S. Kleber, R. W. van der Heijden, F. Kargl
Published: 2020
Mining input grammars from dynamic control flow
R. Gopinath, B. Mathis, A. Zeller
Published: 2020
Fuzz4all: Universal fuzzing with large language models
C. S. Xia, M. Paltenghi, J. Le Tian, M. Pradel, L. Zhang
Published: 2024
Large language models are edge-case generators: Crafting unusual programs for fuzzing deep learning libraries
Y. Deng, C. S. Xia, C. Yang, S. D. Zhang, S. Yang, L. Zhang
Published: 2024
Large language model guided protocol fuzzing
R. Meng, M. Mirchev, M. Bohme, A. Roychoudhury
Published: 2024
Chain-of-Thought prompting elicits reasoning in large language models
Jason Wei, Xuezhi Wang, Dale Schuurmans, et al.
Published: 2022
WHYPER: Towards automating risk assessment of mobile applications
R. Pandita, X. Xiao, W. Yang, W. Enck, T. Xie
Published: 2013
Dase: Document-assisted symbolic execution for improving automated software testing
E. Wong, L. Zhang, S. Wang, T. Liu, L. Tan
Published: 2015
Evaluating large language models in class-level code generation
X. Du, M. Liu, K. Wang, H. Wang, J. Liu, Y. Chen, J. Feng, C. Sha, X. Peng, Y. Lou
Published: 2024
Automated program repair in the era of large pre-trained language models
C. S. Xia, Y. Wei, L. Zhang
Published: 2023
Large Language Models for Code: Security Hardening and Adversarial Testing
Jingxuan He, Martin Vechev
Published: 2.11.2023
How long can context length of open-source llms truly promise?
D. Li, R. Shao, A. Xie, Y. Sheng, L. Zheng, J. Gonzalez, I. Stoica, X. Ma, H. Zhang
Published: 2023
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela
Published: 5.23.2020
The faiss library
M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazare, M. Lomeli, L. Hosseini, H. Jegou
Published: 2024
Towards automated protocol reverse engineering using semantic information
G. Bossert, F. Guihery, G. Hiet
Published: 2014
Share