In recent years there has been a dramatic increase in the number of malware
attacks that use encrypted HTTP traffic for self-propagation or communication.
Antivirus software and firewalls typically will not have access to encryption
keys, and therefore direct detection of malicious encrypted data is unlikely to
succeed. However, previous work has shown that traffic analysis can provide
indications of malicious intent, even in cases where the underlying data
remains encrypted. In this paper, we apply three machine learning techniques to
the problem of distinguishing malicious encrypted HTTP traffic from benign
encrypted traffic and obtain results comparable to previous work. We then
consider the problem of feature analysis in some detail. Previous work has
often relied on human expertise to determine the most useful and informative
features in this problem domain. We demonstrate that such feature-related
information can be obtained directly from machine learning models themselves.
We argue that such a machine learning based approach to feature analysis is
preferable, as it is more reliable, and we can, for example, uncover relatively
unintuitive interactions between features.