As our professional, social, and financial existences become increasingly
digitized and as our government, healthcare, and military infrastructures rely
more on computer technologies, they present larger and more lucrative targets
for malware. Stealth malware in particular poses an increased threat because it
is specifically designed to evade detection mechanisms, spreading dormant, in
the wild for extended periods of time, gathering sensitive information or
positioning itself for a high-impact zero-day attack. Policing the growing
attack surface requires the development of efficient anti-malware solutions
with improved generalization to detect novel types of malware and resolve these
occurrences with as little burden on human experts as possible. In this paper,
we survey malicious stealth technologies as well as existing solutions for
detecting and categorizing these countermeasures autonomously. While machine
learning offers promising potential for increasingly autonomous solutions with
improved generalization to new malware types, both at the network level and at
the host level, our findings suggest that several flawed assumptions inherent
to most recognition algorithms prevent a direct mapping between the stealth
malware recognition problem and a machine learning solution. The most notable
of these flawed assumptions is the closed world assumption: that no sample
belonging to a class outside of a static training set will appear at query
time. We present a formalized adaptive open world framework for stealth malware
recognition and relate it mathematically to research from other machine
learning domains.