Knowledge mining of unstructured information: application to cyber-domain

Authors: Tuomas Takko, Kunal Bhattacharya, Martti Lehto, Pertti Jalasvirta, Aapo Cederberg, Kimmo Kaski | Published: 2021-09-08 | Updated: 2022-08-01

2021.09.082025.05.28

Authors: Tuomas Takko, Kunal Bhattacharya, Martti Lehto, Pertti Jalasvirta, Aapo Cederberg, Kimmo Kaski
Published: 2021-09-08 | Updated: 2022-08-01

Source: https://arxiv.org/abs/2109.03848

PDF: https://arxiv.org/pdf/2109.03848

Labels Predicted by AI

Knowledge Graph Risk Assessment Method Information Extraction Method

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Information on cyber-related crimes, incidents, and conflicts is abundantly available in numerous open online sources. However, processing the large volumes and streams of data is a challenging task for the analysts and experts, and entails the need for newer methods and techniques. In this article we present and implement a novel knowledge graph and knowledge mining framework for extracting the relevant information from free-form text about incidents in the cyberdomain. The framework includes a machine learning based pipeline for generating graphs of organizations, countries, industries, products and attackers with a non-technical cyber-ontology. The extracted knowledge graph is utilized to estimate the incidence of cyberattacks on a given graph configuration. We use publicly available collections of real cyber-incident reports to test the efficacy of our methods. The knowledge extraction is found to be sufficiently accurate, and the graph-based threat estimation demonstrates a level of correlation with the actual records of attacks. In practical use, an analyst utilizing the presented framework can infer additional information from the current cyber-landscape in terms of risk to various entities and propagation of the risk heuristic between industries and countries.