A Graph-based Stratified Sampling Methodology for the Analysis of (Underground) Forums

TOP 文献データベース A Graph-based Stratified Sampling Methodology for the Analysis of (Underground) Forums

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2308.09413

PDF

https://arxiv.org/pdf/2308.09413

文献情報

作者: Giorgio Di Tizio;Gilberto Atondo Siu;Alice Hutchings;Fabio Massacci
公開日: 2023-8-18
所属機関: University of Trento
所属の国: Italy
会議名: IEEE Trans. Inf. Forensics Secur.

AIにより推定されたラベル

データ収集モデル性能評価機械学習技術

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

[Context] Researchers analyze underground forums to study abuse and cybercrime activities. Due to the size of the forums and the domain expertise required to identify criminal discussions, most approaches employ supervised machine learning techniques to automatically classify the posts of interest. [Goal] Human annotation is costly. How to select samples to annotate that account for the structure of the forum? [Method] We present a methodology to generate stratified samples based on information about the centrality properties of the population and evaluate classifier performance. [Result] We observe that by employing a sample obtained from a uniform distribution of the post degree centrality metric, we maintain the same level of precision but significantly increase the recall (+30%) compared to a sample whose distribution is respecting the population stratification. We find that classifiers trained with similar samples disagree on the classification of criminal activities up to 33% of the time when deployed on the entire forum.

外部データセット

CrimeBB dataset

Hack Forums