SPARK: Static Program Analysis Reasoning and Retrieving Knowledge

Authors: Wasuwee Sodsong, Bernhard Scholz, Sanjay Chawla | Published: 2017-11-03

2017.11.032025.05.28

Authors: Wasuwee Sodsong, Bernhard Scholz, Sanjay Chawla
Published: 2017-11-03

Source: https://arxiv.org/abs/1711.01024

PDF: https://arxiv.org/pdf/1711.01024

Labels Predicted by AI

Machine Learning Knowledge Extraction Method Security Analysis Method

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Program analysis is a technique to reason about programs without executing them, and it has various applications in compilers, integrated development environments, and security. In this work, we present a machine learning pipeline that induces a security analyzer for programs by example. The security analyzer determines whether a program is either secure or insecure based on symbolic rules that were deduced by our machine learning pipeline. The machine pipeline is two-staged consisting of a Recurrent Neural Networks (RNN) and an Extractor that converts an RNN to symbolic rules. To evaluate the quality of the learned symbolic rules, we propose a sampling-based similarity measurement between two infinite regular languages. We conduct a case study using real-world data. In this work, we discuss the limitations of existing techniques and possible improvements in the future. The results show that with sufficient training data and a fair distribution of program paths it is feasible to deducing symbolic security rules for the OpenJDK library with millions lines of code.