Malware Classification with Word Embedding Features

TOP 文献データベース Malware Classification with Word Embedding Features

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2103.02711

PDF

https://arxiv.org/pdf/2103.02711

文献情報

作者: Aparna Sunil Kale;Fabio Di Troia;Mark Stamp
公開日: 2021-3-4
所属機関: San Jose State University
所属の国: United States of America
会議名: International Conference on Information Systems Security and Privacy (ICISSP)

AIにより推定されたラベル

機械学習メンバーシップ推論マルチクラス分類

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Malware classification is an important and challenging problem in information security. Modern malware classification techniques rely on machine learning models that can be trained on features such as opcode sequences, API calls, and byte $n$-grams, among many others. In this research, we consider opcode features. We implement hybrid machine learning techniques, where we engineer feature vectors by training hidden Markov models -- a technique that we refer to as HMM2Vec -- and Word2Vec embeddings on these opcode sequences. The resulting HMM2Vec and Word2Vec embedding vectors are then used as features for classification algorithms. Specifically, we consider support vector machine (SVM), $k$-nearest neighbor ($k$-NN), random forest (RF), and convolutional neural network (CNN) classifiers. We conduct substantial experiments over a variety of malware families. Our experiments extend well beyond any previous work in this field.

外部データセット

2793 malware families with one or more samples per family

seven families with more than 1000 samples