Malware authors have always been at an advantage of being able to
adversarially test and augment their malicious code, before deploying the
payload, using anti-malware products at their disposal. The anti-malware
developers and threat experts, on the other hand, do not have such a privilege
of tuning anti-malware products against zero-day attacks pro-actively. This
allows the malware authors to being a step ahead of the anti-malware products,
fundamentally biasing the cat and mouse game played by the two parties. In this
paper, we propose a way that would enable machine learning based threat
prevention models to bridge that gap by being able to tune against a deep
generative adversarial network (GAN), which takes up the role of a malware
author and generates new types of malware. The GAN is trained over a reversible
distributed RGB image representation of known malware behaviors, encoding the
sequence of API call ngrams and the corresponding term frequencies. The
generated images represent synthetic malware that can be decoded back to the
underlying API call sequence information. The image representation is not only
demonstrated as a general technique of incorporating necessary priors for
exploiting convolutional neural network architectures for generative or
discriminative modeling, but also as a visualization method for easy manual
software or malware categorization, by having individual API ngram information
distributed across the image space. In addition, we also propose using
smart-definitions for detecting malwares based on perceptual hashing of these
images. Such hashes are potentially more effective than cryptographic hashes
that do not carry any meaningful similarity metric, and hence, do not
generalize well.