Inferring Discussion Topics about Exploitation of Vulnerabilities from Underground Hacking Forums

Proceedings of the 2017 ACM SIGSAC conference on computer and communications security

Economic factors of vulnerability trade and exploitation

Allodi, L.

Published: 2017

The 2019 Workshop on the Economics of Information Security

Measuring the changing cost of cybercrime

Anderson, R., et al.

Published: 2019

Journal of Computer Networks and Communications

Threats from the dark: a review over dark web investigation research for cyber threat intelligence

Basheer, R., Alkhatib, B.

Published: 2021

Proceedings of the 23rd international conference on Machine learning

Dynamic topic models

Blei, D.M., Lafferty, J.D.

Published: 2006

J. Mach. Learn. Res.

Latent dirichlet allocation

Blei, D.M., Ng, A., Jordan, M.I.

Published: 2001

APWG eCrime 2022

Threat/crawl: a trainable, highly-reusable, and extensible automated method and tool to crawl criminal underground forums

Campobasso, M., Allodi, L.

Published: 2022

Network and Distributed System Security Symposium

Towards automated dynamic analysis for linux-based embedded firmware

Chen, D.D., Woo, M., Brumley, D., Egele, M.

Published: 2016

2015 IEEE 2nd International Conference on Cyber Security and Cloud Computing

Predicting vulnerability exploits in the wild

Edkrantz, M., Truve, S., Said, A.

Published: 2015

Conference on Uncertainty in Artificial Intelligence

Probabilistic latent semantic analysis

Hofmann, T.

Published: 1999

Radiographics : a review publication of the Radiological Society of North America, Inc

Bag-of-words technique in natural language processing: A primer for radiologists

Juluru, K., Shih, H.H., Murthy, K.N.K., Elnajjar, P.

Published: 2021

Nature

Learning the parts of objects by non-negative matrix factorization

Lee, D.D., Seung, H.S.

Published: 1999

Annual International Symposium on Information Management and Big Data

Car monitoring system in apartments’ garages by small autonomous car using deep learning

Leon-Vera, L., Moreno-Vera, F.

Published: 2018

IEEE Transactions on Reliability

Fuzzing: State of the art

Liang, H., Pei, X., Jia, X., Shen, W., Zhang, J.

Published: 2018

Smart Trends in Computing and Communications

Morarch: A software architecture for interoperability to improve the communication in the edge layer of a smart iot ecosystem

Moreno-Motta, J., Moreno-Vera, F., Moreno, F.A.

Published: 2022

2019 IEEE Latin American Conference on Computational Intelligence (LA-CCI)

Performing deep recurrent double q-learning for atari games

Moreno-Vera, F.

Published: 2019

2019 IEEE World Conference on Engineering Education (EDUNINE)

Comparison of the learning curve and adaptive behavior from kids to adults using computational thinking with block-programming to technology enhanced learning

Moreno-Vera, F., Leon-Vera, L., Moreno-Motta, J., Guizado-Vasquez, J., Vera-Panez, M.

Published: 2019

International Conference on Intelligent Computing

Understanding safety based on urban perception

Moreno-Vera, F.

Published: 2021

arxiv

Cited by 1

Cream Skimming the Underground: Identifying Relevant Information Points from Online Forums

Felipe Moreno-Vera, Mateus Nogueira, Cainã Figueiredo, Daniel Sadoc Menasché, Miguel Bicudo, Ashton Woiwood, Enrico Lovat, Anton Kocheturov, Leandro Pfleger de Aguiar

Published: 8.4.2023

This paper proposes a machine learning-based approach for detecting the exploitation of vulnerabilities in the wild by monitoring underground hacking forums. The increasing volume of posts discussing exploitation in the wild calls for an automatic approach to process threads and posts that will eventually trigger alarms depending on their content. To illustrate the proposed system, we use the CrimeBB dataset, which contains data scraped from multiple underground forums, and develop a supervised machine learning model that can filter threads citing CVEs and label them as Proof-of-Concept, Weaponization, or Exploitation. Leveraging random forests, we indicate that accuracy, precision and recall above 0.99 are attainable for the classification task. Additionally, we provide insights into the difference in nature between weaponization and exploitation, e.g., interpreting the output of a decision tree, and analyze the profits and other aspects related to the hacking communities. Overall, our work sheds insight into the exploitation of vulnerabilities in the wild and can be used to provide additional ground truth to models such as EPSS and Expected Exploitability.

Vulnerability Management Data Collection Cyber Attack

NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic

Gensim–python framework for vector space modelling

R. Rehurek, P. Sojka

Published: 2011

Inf. Process. Manag.

Term-weighting approaches in automatic text retrieval

Salton, G., Buckley, C.

Published: 1988

The future of cybercrime & security

Susan Morrow, T.C.

Published: 2019

Fuzzing: Brute force vulnerability discovery

Sutton, M.S., Greene, A.R., Amini, P.F.

Published: 2007

Journal of the American Statistical Association

Hierarchical dirichlet processes

Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.

Published: 2006