As people's demand for personal privacy and data security becomes a priority,
encrypted traffic has become mainstream in the cyber world. However, traffic
encryption is also shielding malicious and illegal traffic introduced by
adversaries, from being detected. This is especially so in the post-COVID-19
environment where malicious traffic encryption is growing rapidly. Common
security solutions that rely on plain payload content analysis such as deep
packet inspection are rendered useless. Thus, machine learning based approaches
have become an important direction for encrypted malicious traffic detection.
In this paper, we formulate a universal framework of machine learning based
encrypted malicious traffic detection techniques and provided a systematic
review. Furthermore, current research adopts different datasets to train their
models due to the lack of well-recognized datasets and feature sets. As a
result, their model performance cannot be compared and analyzed reliably.
Therefore, in this paper, we analyse, process and combine datasets from 5
different sources to generate a comprehensive and fair dataset to aid future
research in this field. On this basis, we also implement and compare 10
encrypted malicious traffic detection algorithms. We then discuss challenges
and propose future directions of research.