Converting malware into images followed by vision-based deep learning
algorithms has shown superior threat detection efficacy compared with classical
machine learning algorithms. When malware are visualized as images,
visual-based interpretation schemes can also be applied to extract insights of
why individual samples are classified as malicious. In this work, via two case
studies of dynamic malware classification, we extend the local interpretable
model-agnostic explanation algorithm to explain image-based dynamic malware
classification and examine its interpretation fidelity. For both case studies,
we first train deep learning models via transfer learning on malware images,
demonstrate high classification effectiveness, apply an explanation method on
the images, and correlate the results back to the samples to validate whether
the algorithmic insights are consistent with security domain expertise. In our
first case study, the interpretation framework identifies indirect calls that
uniquely characterize the underlying exploit behavior of a malware family. In
our second case study, the interpretation framework extracts insightful
information such as cryptography-related APIs when applied on images created
from API existence, but generate ambiguous interpretation on images created
from API sequences and frequencies. Our findings indicate that current
image-based interpretation techniques are promising for explaining vision-based
malware classification. We continue to develop image-based interpretation
schemes specifically for security applications.