These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Program analysis is a technique to reason about programs without executing
them, and it has various applications in compilers, integrated development
environments, and security. In this work, we present a machine learning
pipeline that induces a security analyzer for programs by example. The security
analyzer determines whether a program is either secure or insecure based on
symbolic rules that were deduced by our machine learning pipeline. The machine
pipeline is two-staged consisting of a Recurrent Neural Networks (RNN) and an
Extractor that converts an RNN to symbolic rules.
To evaluate the quality of the learned symbolic rules, we propose a
sampling-based similarity measurement between two infinite regular languages.
We conduct a case study using real-world data. In this work, we discuss the
limitations of existing techniques and possible improvements in the future. The
results show that with sufficient training data and a fair distribution of
program paths it is feasible to deducing symbolic security rules for the
OpenJDK library with millions lines of code.