Underground forums where users discuss, buy, and sell illicit services and
goods facilitate a better understanding of the economy and organization of
cybercriminals. Prior work has shown that in particular private interactions
provide a wealth of information about the cybercriminal ecosystem. Yet, those
messages are seldom available to analysts, except when there is a leak. To
address this problem we propose a supervised machine learning based method able
to predict which public \threads will generate private messages, after a
partial leak of such messages has occurred. To the best of our knowledge, we
are the first to develop a solution to overcome the barrier posed by limited to
no information on private activity for underground forum analysis.
Additionally, we propose an automate method for labeling posts, significantly
reducing the cost of our approach in the presence of real unlabeled data. This
method can be tuned to focus on the likelihood of users receiving private
messages, or \threads triggering private interactions. We evaluate the
performance of our methods using data from three real forum leaks. Our results
show that public information can indeed be used to predict private activity,
although prediction models do not transfer well between forums. We also find
that neither the length of the leak period nor the time between the leak and
the prediction have significant impact on our technique's performance, and that
NLP features dominate the prediction power.