Bot detection using machine learning (ML), with network flow-level features,
has been extensively studied in the literature. However, existing flow-based
approaches typically incur a high computational overhead and do not completely
capture the network communication patterns, which can expose additional aspects
of malicious hosts. Recently, bot detection systems which leverage
communication graph analysis using ML have gained attention to overcome these
limitations. A graph-based approach is rather intuitive, as graphs are true
representations of network communications. In this paper, we propose a
two-phased, graph-based bot detection system which leverages both unsupervised
and supervised ML. The first phase prunes presumable benign hosts, while the
second phase achieves bot detection with high precision. Our system detects
multiple types of bots and is robust to zero-day attacks. It also accommodates
different network topologies and is suitable for large-scale data.