Botnet detection is a critical step in stopping the spread of botnets and
preventing malicious activities. However, reliable detection is still a
challenging task, due to a wide variety of botnets involving ever-increasing
types of devices and attack vectors. Recent approaches employing machine
learning (ML) showed improved performance than earlier ones, but these ML-
based approaches still have significant limitations. For example, most ML
approaches can not incorporate sequential pattern analysis techniques key to
detect some classes of botnets. Another common shortcoming of ML-based
approaches is the need to retrain neural networks in order to detect the
evolving botnets; however, the training process is time-consuming and requires
significant efforts to label the training data. For fast-evolving botnets, it
might take too long to create sufficient training samples before the botnets
have changed again. To address these challenges, we propose a novel botnet
detection method, built upon Recurrent Variational Autoencoder (RVAE) that
effectively captures sequential characteristics of botnet activities. In the
experiment, this semi-supervised learning method achieves better detection
accuracy than similar learning methods, especially on hard to detect classes.
Additionally, we devise a transfer learning framework to learn from a
well-curated source data set and transfer the knowledge to a target problem
domain not seen before. Tests show that the true-positive rate (TPR) with
transfer learning is higher than the RVAE semi-supervised learning method
trained using the target data set (91.8% vs. 68.3%).