AIセキュリティポータル K Program
Automated Mapping of CVE Vulnerability Records to MITRE CWE Weaknesses
Share
Abstract
In recent years, a proliferation of cyber-security threats and diversity has been on the rise culminating in an increase in their reporting and analysis. To counter that, many non-profit organizations have emerged in this domain, such as MITRE and OSWAP, which have been actively tracking vulnerabilities, and publishing defense recommendations in standardized formats. As producing data in such formats manually is very time-consuming, there have been some proposals to automate the process. Unfortunately, a major obstacle to adopting supervised machine learning for this problem has been the lack of publicly available specialized datasets. Here, we aim to bridge this gap. In particular, we focus on mapping CVE records into MITRE CWE Weaknesses, and we release to the research community a manually annotated dataset of 4,012 records for this task. With a human-in-the-loop framework in mind, we approach the problem as a ranking task and aim to incorporate reinforced learning to make use of the human feedback in future work. Our experimental results using fine-tuned deep learning models, namely Sentence-BERT and rankT5, show sizable performance gains over BM25, BERT, and RoBERTa, which demonstrates the need for an architecture capable of good semantic understanding for this task.
DARKMENTION: A Deployed System to Predict Enterprise-Targeted External Cyberattacks
Mohammed Almukaynizi, Ericsson Marin, Eric Nunes, Paulo Shakarian, Gerardo I. Simari, Dipsy Kapoor, Timothy Siedlecki
Published: 2018
Automated Threat Report Classification over Multi-Source Data
Gbadebo Ayoade, Swarup Chandra, Latifur Khan, Kevin Hamlen, Bhavani Thuraisingham
Published: 2018
Imagenet: A large-scale hierarchical image database
J. Deng, W. Dong, R. Socher, L. Li, K. Li, L. Fei-Fei
Published: 2009
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
Published: 2019
Information extraction of cybersecurity concepts: An lstm approach
Houssem Gasmi, Jannik Laval, Abdelaziz Bouras
Published: 2019
Extracting information about security vulnerabilities from web text
Varish Mulwad, Wenjia Li, Anupam Joshi, Tim Finin, Krishnamurthy Viswanathan
Published: 2011
Ttpdrill: Automatic and accurate extraction of threat actions from unstructured text of cti sources
G. Husari, E. Al-Shaer, M. Ahmed, B. Chu, X. Niu
Published: 2017
Towards a Relation Extraction Framework for Cyber-Security Concepts
Corinne L. Jones, Robert A. Bridges, Kelly M. T. Huffer, John R. Goodall
Published: 2015
Natural questions: a benchmark for question answering research
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, et al.
Published: 2019
Neural architectures for named entity recognition
Guillaume Lample, Miguel Ballesteros
Published: 2016
Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat Reports
Valentine Legoy, Marco Caselli, Christin Seifert, Andreas Peter
Published: 2020
STIXnet: Entity and Relation Extraction from Unstructured CTI Reports
Francesco Marchiori, Mauro Conti, Nino Vincenzo Verde
Published: 2021
End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures
Makoto Miwa, Mohit Bansal
Published: 2016
Rethinking Self-Attention: Towards Interpretability in Neural Parsing
Khalil Mrini, Franck Dernoncourt, Quan Hung Tran, Trung Bui, Walter Chang, Ndapa Nakashole
Published: 2020
Machine learning: A Probabilistic Perspective
Kevin P. Murphy
Published: 2013
RedAI: A Machine Learning Approach to Cyber Threat Intelligence
Luke Noel
Published: 2021
Document Ranking with a Pretrained Sequence-to-Sequence Model
Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, Jimmy Lin
Published: 2020
Training language models to follow instructions with human feedback
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe
Published: 3.4.2022
Exploring the limits of transfer learning with a unified text-to-text transformer
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J Liu
Published: 2020
Pretrained Model - Sbert.net
Nils Reimers
Published: 2021
Sentence-bert: Sentence embeddings using siamese bert-networks
Nils Reimers, Iryna Gurevych
Published: 2019
The Probabilistic Relevance Framework: BM25 and Beyond
Stephen Robertson, Hugo Zaragoza
Published: 2009
Sequence to sequence learning with neural networks
Ilya Sutskever, Oriol Vinyals, Quoc V. Le
Published: 2014
Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Path
Xu Yan, Lili Mou, Ge Li, Yunchuan Chen, Hao Peng, Zhi Jin
Published: 2015
Tim: threat context-enhanced ttp intelligence mining on unstructured threat data
Y. You, J. Jiang, Z. Jiang, P. Yang, B. Liu, H. Feng, X. Wang, N. Li
Published: 2022
FeatureSmith: Automatically Engineering Features for Malware Detection by Mining the Security Literature
Ziyun Zhu, Tudor Dumitras
Published: 2016
Chainsmith: Automatically learning the semantics of malicious campaigns by mining threat intelligence reports
Z. Zhu, T. Dumitras
Published: 2018
RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses
Honglei Zhuang, Zhen Qin, Rolf Jagerman, Kai Hui, Ji Ma, Jing Lu, Jianmo Ni, Xuanhui Wang, Michael Bendersky
Published: 2022
Share