AIセキュリティポータル K Program
CovRL: Fuzzing JavaScript Engines with Coverage-Guided Reinforcement Learning for LLM-based Mutation
Share
Abstract
Fuzzing is an effective bug-finding technique but it struggles with complex systems like JavaScript engines that demand precise grammatical input. Recently, researchers have adopted language models for context-aware mutation in fuzzing to address this problem. However, existing techniques are limited in utilizing coverage guidance for fuzzing, which is rather performed in a black-box manner. This paper presents a novel technique called CovRL (Coverage-guided Reinforcement Learning) that combines Large Language Models (LLMs) with reinforcement learning from coverage feedback. Our fuzzer, CovRL-Fuzz, integrates coverage feedback directly into the LLM by leveraging the Term Frequency-Inverse Document Frequency (TF-IDF) method to construct a weighted coverage map. This map is key in calculating the fuzzing reward, which is then applied to the LLM-based mutator through reinforcement learning. CovRL-Fuzz, through this approach, enables the generation of test cases that are more likely to discover new coverage areas, thus improving vulnerability detection while minimizing syntax and semantic errors, all without needing extra post-processing. Our evaluation results indicate that CovRL-Fuzz outperforms the state-of-the-art fuzzers in terms of code coverage and bug-finding capabilities: CovRL-Fuzz identified 48 real-world security-related bugs in the latest JavaScript engines, including 39 previously unknown vulnerabilities and 11 CVEs.
Nautilus: Fishing for deep bugs with grammars
C. Aschermann, T. Frassetto, T. Holz, P. Jauernig, A.-R. Sadeghi, D. Teuchert
Published: 2019
Program synthesis with large language models
J. Austin, A. Odena, M. Nye, M. Bosma, H. Michalewski, D. Dohan, E. Jiang, C. Cai, M. Terry, Q. Le
Published: 2021
Deep reinforcement fuzzing
K. Böttinger, P. Godefroid, R. Singh
Published: 2018
Language models are few-shot learners
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei
Published: 2020
Evaluating large language models trained on code
M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman
Published: 2021
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel
Published: 2022
Compiler fuzzing through deep learning
C. Cummins, P. Petoumenos, A. Murray, H. Leather
Published: 2018
Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models
Yinlin Deng, Chunqiu Steven Xia, Haoran Peng, Chenyuan Yang, Lingming Zhang
Published: 2023
Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt
Y. Deng, C. S. Xia, C. Yang, S. D. Zhang, S. Yang, L. Zhang
Published: 2023
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
Published: 2019
Automated repair of programs from large language models
Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, Shin Hwei Tan
Published: 2023
Incoder: A generative model for code infilling and synthesis
Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Scott Yih, Luke Zettlemoyer, Mike Lewis
Published: 2023
Learn&Fuzz: Machine Learning for Input Fuzzing
Patrice Godefroid, Hila Peleg, Rishabh Singh
Published: 2017.1.25
Codealchemist: Semantics-aware code generation to find vulnerabilities in javascript engines
H. Han, D. Oh, S. K. Cha
Published: 2019
Sofi: Reflection-augmented fuzzing for javascript engines
X. He, X. Xie, Y. Li, J. Sun, F. Li, W. Zou, Y. Liu, L. Yu, J. Zhou, W. Shi
Published: 2021
Evaluating fuzz testing
G. Klees, A. Ruef, B. Cooper, S. Wei, M. Hicks
Published: 2018
Coderl: Mastering code generation through pretrained models and deep reinforcement learning
H. Le, Y. Wang, A. D. Gotmare, S. Savarese, S. C. Hoi
Published: 2022
Rlaif: Scaling reinforcement learning from human feedback with ai feedback
H. Lee, S. Phatale, H. Mansoor, T. Mesnard, J. Ferret, K. Lu, C. Bishop, E. Hall, V. Carbune, A. Rastogi, S. Prakash
Published: 2023
Montage: A neural network language Model-Guided JavaScript engine fuzzer
S. Lee, H. Han, S. K. Cha, S. Son
Published: 2020
Fairfuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage
C. Lemieux, K. Sen
Published: 2018
Alphaprog: reinforcement generation of valid programs for compiler fuzzing
X. Li, X. Liu, L. Chen, R. Prajapati, D. Wu
Published: 2022
Fuzzboost: Reinforcement compiler fuzzing
X. Li, X. Liu, L. Chen, R. Prajapati, D. Wu
Published: 2022
Rltf: Reinforcement learning from unit test feedback
J. Liu, Y. Zhu, K. Xiao, Q. Fu, X. Han, W. Yang, D. Ye
Published: 2023
Deepfuzz: Automatic generation of syntax valid c programs for fuzz testing
X. Liu, X. Li, R. Prajapati, D. Wu
Published: 2019
Decoupled weight decay regularization
Ilya Loshchilov, Frank Hutter
Published: 2018
On a test of whether one of two random variables is stochastically larger than the other
H. B. Mann, D. R. Whitney
Published: 1947
$hell on earth: From browser to system compromise
Matt Molinyawe, Adul-Aziz Hariri, J. S.
Published: 2016
An empirical study of the reliability of UNIX utilities
B. P. Miller, L. Fredriksen, B. So
Published: 1990
Training language models to follow instructions with human feedback
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe
Published: 2022.3.4
Fuzzing javascript engines with aspect-preserving mutation
S. Park, W. Xu, I. Yun, D. Jang, T. Kim
Published: 2020
Learning to fuzz: Application-independent fuzz testing with probabilistic, generative models of input data
J. Patra, M. Pradel
Published: 2016
Exploring the limits of transfer learning with a unified text-to-text transformer
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J Liu
Published: 2020
Factually consistent summarization via reinforcement learning with textual entailment feedback
P. Roit, J. Ferret, L. Shani, R. Aharoni, G. Cideron, R. Dadashi, M. Geist, S. Girgin, L. Hussenot, O. Keller
Published: 2023
Token-Level fuzzing
C. Salls, C. Jindal, J. Corina, C. Kruegel, G. Vigna
Published: 2021
Proximal policy optimization algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov
Published: 2017
{AddressSanitizer}: A fast address sanity checker
K. Serebryany, D. Bruening, A. Potapenko, D. Vyukov
Published: 2012
Execution-based code generation using deep reinforcement learning
P. Shojaee, A. Jain, S. Tipirneni, C. K. Reddy
Published: 2023
A statistical interpretation of term specificity and its application in retrieval
K. Sparck Jones
Published: 1972
A contrastive framework for neural text generation
Y. Su, T. Lan, Y. Wang, D. Yogatama, L. Kong, N. Collier
Published: 2022
Ifuzzer: An evolutionary interpreter fuzzer using genetic programming
S. Veggalam, S. Rawat, I. Haller, H. Bos
Published: 2016
Skyfire: Data-driven seed generation for fuzzing
J. Wang, B. Chen, L. Wei, Y. Liu
Published: 2017
Superion: Grammar-aware greybox fuzzing
J. Wang, B. Chen, L. Wei, Y. Liu
Published: 2019
Codet5+: Open code large language models for code understanding and generation
Y. Wang, H. Le, A. D. Gotmare, N. D. Bui, J. Li, S. C. Hoi
Published: 2023
Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation
Y. Wang, W. Wang, S. Joty, S. C. Hoi
Published: 2021
Finetuned language models are zero-shot learners
J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, Q. V. Le
Published: 2022
Less training, more repairing please: revisiting automated program repair via zero-shot learning
C. S. Xia, L. Zhang
Published: 2022
Automated conformance testing for javascript engines via deep compiler fuzzing
G. Ye, Z. Tang, S. H. Tan, S. Huang, D. Fang, X. Sun, L. Bian, H. Wang, Z. Wang
Published: 2021
Productivity assessment of neural code completion
A. Ziegler, E. Kalliamvakou, X. A. Li, A. Rice, D. Rifkin, S. Simister, G. Sittampalam, E. Aftandilian
Published: 2022
Share