Dynamic taint analysis (DTA) is widely used by various applications to track
information flow during runtime execution. Existing DTA techniques use
rule-based taint-propagation, which is neither accurate (i.e., high false
positive) nor efficient (i.e., large runtime overhead). It is hard to specify
taint rules for each operation while covering all corner cases correctly.
Moreover, the overtaint and undertaint errors can accumulate during the
propagation of taint information across multiple operations. Finally,
rule-based propagation requires each operation to be inspected before applying
the appropriate rules resulting in prohibitive performance overhead on large
real-world applications.
In this work, we propose NEUTAINT, a novel end-to-end approach to track
information flow using neural program embeddings. The neural program embeddings
model the target's programs computations taking place between taint sources and
sinks, which automatically learns the information flow by observing a diverse
set of execution traces. To perform lightweight and precise information flow
analysis, we utilize saliency maps to reason about most influential sources for
different sinks. NEUTAINT constructs two saliency maps, a popular machine
learning approach to influence analysis, to summarize both coarse-grained and
fine-grained information flow in the neural program embeddings.
We compare NEUTAINT with 3 state-of-the-art dynamic taint analysis tools. The
evaluation results show that NEUTAINT can achieve 68% accuracy, on average,
which is 10% improvement while reducing 40 times runtime overhead over the
second-best taint tool Libdft on 6 real world programs. NEUTAINT also achieves
61% more edge coverage when used for taint-guided fuzzing indicating the
effectiveness of the identified influential bytes.