In the face of large-scale automated social engineering attacks to large
online services, fast detection and remediation of compromised accounts are
crucial to limit the spread of new attacks and to mitigate the overall damage
to users, companies, and the public at large. We advocate a fully automated
approach based on machine learning: we develop an early warning system that
harnesses account activity traces to predict which accounts are likely to be
compromised in the future and generate suspicious activity. We hypothesize that
this early warning is key for a more timely detection of compromised accounts
and consequently faster remediation. We demonstrate the feasibility and
applicability of the system through an experiment at a large-scale online
service provider using four months of real-world production data encompassing
hundreds of millions of users. We show that - even using only login data to
derive features with low computational cost, and a basic model selection
approach - our classifier can be tuned to achieve good classification precision
when used for forecasting. Our system correctly identifies up to one month in
advance the accounts later flagged as suspicious with precision, recall, and
false positive rates that indicate the mechanism is likely to prove valuable in
operational settings to support additional layers of defense.