Proactive approaches to security, such as adversary emulation, leverage
information about threat actors and their techniques (Cyber Threat
Intelligence, CTI). However, most CTI still comes in unstructured forms (i.e.,
natural language), such as incident reports and leaked documents. To support
proactive security efforts, we present an experimental study on the automatic
classification of unstructured CTI into attack techniques using machine
learning (ML). We contribute with two new datasets for CTI analysis, and we
evaluate several ML models, including both traditional and deep learning-based
ones. We present several lessons learned about how ML can perform at this task,
which classifiers perform best and under which conditions, which are the main
causes of classification errors, and the challenges ahead for CTI analysis.