These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Many organizations rely on Threat Intelligence (TI) feeds to assess the risk
associated with security threats. Due to the volume and heterogeneity of data,
it is prohibitive to manually analyze the threat information available in
different loosely structured TI feeds. Thus, there is a need to develop
automated methods to vet and extract actionable information from TI feeds. To
this end, we present a machine learning pipeline to automatically detect
vulnerability exploitation from TI feeds. We first model threat vocabulary in
loosely structured TI feeds using state-of-the-art embedding techniques
(Doc2Vec and BERT) and then use it to train a supervised machine learning
classifier to detect exploitation of security vulnerabilities. We use our
approach to identify exploitation events in 191 different TI feeds. Our
longitudinal evaluation shows that it is able to accurately identify
exploitation events from TI feeds only using past data for training and even on
TI feeds withheld from training. Our proposed approach is useful for a variety
of downstream tasks such as data-driven vulnerability risk assessment.