A multi-task learning model for malware classification with useful file access pattern from API call sequence

TOP 文献データベース A multi-task learning model for malware classification with useful file access pattern from API call sequence

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1610.05945

PDF

https://arxiv.org/pdf/1610.05945

文献情報

作者: Xin Wang,Siu Ming Yiu
公開日: 2016-10-19
所属機関: Department of Computer Science , The University of Hong Kong
所属の国: Hong Kong
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

マルウェア分類モデル識別 APIセキュリティ

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Based on API call sequences, semantic-aware and machine learning (ML) based malware classifiers can be built for malware detection or classification. Previous works concentrate on crafting and extracting various features from malware binaries, disassembled binaries or API calls via static or dynamic analysis and resorting to ML to build classifiers. However, they tend to involve too much feature engineering and fail to provide interpretability. We solve these two problems with the recent advances in deep learning: 1) RNN-based autoencoders (RNN-AEs) can automatically learn low-dimensional representation of a malware from its raw API call sequence. 2) Multiple decoders can be trained under different supervisions to give more information, other than the class or family label of a malware. Inspired by the works of document classification and automatic sentence summarization, each API call sequence can be regarded as a sentence. In this paper, we make the first attempt to build a multi-task malware learning model based on API call sequences. The model consists of two decoders, one for malware classification and one for $\emph{file access pattern}$ (FAP) generation given the API call sequence of a malware. We base our model on the general seq2seq framework. Experiments show that our model can give competitive classification results as well as insightful FAP information.

外部データセット

API call sequence dataset (Kim 2016)

7430 samples for coarse-grained evaluation

4932 samples for fine-grained evaluation