Generating End-to-End Adversarial Examples for Malware Classifiers Using Explainability

TOP 文献データベース Generating End-to-End Adversarial Examples for Malware Classifiers Using Explainability

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2009.13243

PDF

https://arxiv.org/pdf/2009.13243

文献情報

作者: Ishai Rosenberg;Shai Meir;Jonathan Berrebi;Ilay Gordon;Guillaume Sicard;Eli David
公開日: 2020-9-28
更新日: 2022-6-1
所属機関: Deep Instinct Ltd
所属の国: Israel
会議名: IEEE International Joint Conference on Neural Network (IJCNN)

AIにより推定されたラベル

敵対的サンプルマルウェア分類モデルの解釈性

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

In recent years, the topic of explainable machine learning (ML) has been extensively researched. Up until now, this research focused on regular ML users use-cases such as debugging a ML model. This paper takes a different posture and show that adversaries can leverage explainable ML to bypass multi-feature types malware classifiers. Previous adversarial attacks against such classifiers only add new features and not modify existing ones to avoid harming the modified malware executable's functionality. Current attacks use a single algorithm that both selects which features to modify and modifies them blindly, treating all features the same. In this paper, we present a different approach. We split the adversarial example generation task into two parts: First we find the importance of all features for a specific sample using explainability algorithms, and then we conduct a feature-specific modification, feature-by-feature. In order to apply our attack in black-box scenarios, we introduce the concept of transferability of explainability, that is, applying explainability algorithms to different classifiers using different features subsets and trained on different datasets still result in a similar subset of important features. We conclude that explainability algorithms can be leveraged by adversaries and thus the advocates of training more interpretable classifiers should consider the trade-off of higher vulnerability of those classifiers to adversarial attacks.