Security vulnerabilities play a vital role in network security system.
Fuzzing technology is widely used as a vulnerability discovery technology to
reduce damage in advance. However, traditional fuzzing techniques have many
challenges, such as how to mutate input seed files, how to increase code
coverage, and how to effectively bypass verification. Machine learning
technology has been introduced as a new method into fuzzing test to alleviate
these challenges. This paper reviews the research progress of using machine
learning technology for fuzzing test in recent years, analyzes how machine
learning improve the fuzz process and results, and sheds light on future work
in fuzzing. Firstly, this paper discusses the reasons why machine learning
techniques can be used for fuzzing scenarios and identifies six different
stages in which machine learning have been used. Then this paper systematically
study the machine learning based fuzzing models from selection of machine
learning algorithm, pre-processing methods, datasets, evaluation metrics, and
hyperparameters setting. Next, this paper assesses the performance of the
machine learning models based on the frequently used evaluation metrics. The
results of the evaluation prove that machine learning technology has an
acceptable capability of categorize predictive for fuzzing. Finally, the
comparison on capability of discovering vulnerabilities between traditional
fuzzing tools and machine learning based fuzzing tools is analyzed. The results
depict that the introduction of machine learning technology can improve the
performance of fuzzing. However, there are still some limitations, such as
unbalanced training samples and difficult to extract the characteristics
related to vulnerabilities.