These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The use of TLS by malware poses new challenges to network threat detection
because traditional pattern-matching techniques can no longer be applied to its
messages. However, TLS also introduces a complex set of observable data
features that allow many inferences to be made about both the client and the
server. We show that these features can be used to detect and understand
malware communication, while at the same time preserving the privacy of benign
uses of encryption. These data features also allow for accurate malware family
attribution of network communication, even when restricted to a single,
encrypted flow.
To demonstrate this, we performed a detailed study of how TLS is used by
malware and enterprise applications. We provide a general analysis on millions
of TLS encrypted flows, and a targeted study on 18 malware families composed of
thousands of unique malware samples and ten-of-thousands of malicious TLS
flows. Importantly, we identify and accommodate the bias introduced by the use
of a malware sandbox. The performance of a malware classifier is correlated
with a malware family's use of TLS, i.e., malware families that actively evolve
their use of cryptography are more difficult to classify.
We conclude that malware's usage of TLS is distinct from benign usage in an
enterprise setting, and that these differences can be effectively used in rules
and machine learning classifiers.
External Datasets
malware traffic collected from August 2015 to May 2016
enterprise traffic collected during a 4 day period in May 2016