Identifying the tasks a given piece of malware was designed to perform (e.g.
logging keystrokes, recording video, establishing remote access, etc.) is a
difficult and time-consuming operation that is largely human-driven in
practice. In this paper, we present an automated method to identify malware
tasks. Using two different malware collections, we explore various
circumstances for each - including cases where the training data differs
significantly from test; where the malware being evaluated employs packing to
thwart analytical techniques; and conditions with sparse training data. We find
that this approach consistently out-performs the current state-of-the art
software for malware task identification as well as standard machine learning
approaches - often achieving an unbiased F1 score of over 0.9. In the near
future, we look to deploy our approach for use by analysts in an operational
cyber-security environment.