Cyber attacks are growing in frequency and severity. Over the past year alone
we have witnessed massive data breaches that stole personal information of
millions of people and wide-scale ransomware attacks that paralyzed critical
infrastructure of several countries. Combating the rising cyber threat calls
for a multi-pronged strategy, which includes predicting when these attacks will
occur. The intuition driving our approach is this: during the planning and
preparation stages, hackers leave digital traces of their activities on both
the surface web and dark web in the form of discussions on platforms like
hacker forums, social media, blogs and the like. These data provide predictive
signals that allow anticipating cyber attacks. In this paper, we describe
machine learning techniques based on deep neural networks and autoregressive
time series models that leverage external signals from publicly available Web
sources to forecast cyber attacks. Performance of our framework across ground
truth data over real-world forecasting tasks shows that our methods yield a
significant lift or increase of F1 for the top signals on predicted cyber
attacks. Our results suggest that, when deployed, our system will be able to
provide an effective line of defense against various types of targeted cyber
attacks.