AIセキュリティポータル K Program
Transcending Transcend: Revisiting Malware Classification in the Presence of Concept Drift
Share
Abstract
Machine learning for malware classification shows encouraging results, but real deployments suffer from performance degradation as malware authors adapt their techniques to evade detection. This phenomenon, known as concept drift, occurs as new malware examples evolve and become less and less like the original training examples. One promising method to cope with concept drift is classification with rejection in which examples that are likely to be misclassified are instead quarantined until they can be expertly analyzed. We propose TRANSCENDENT, a rejection framework built on Transcend, a recently proposed strategy based on conformal prediction theory. In particular, we provide a formal treatment of Transcend, enabling us to refine conformal evaluation theory -- its underlying statistical engine -- and gain a better understanding of the theoretical reasons for its effectiveness. In the process, we develop two additional conformal evaluators that match or surpass the performance of the original while significantly decreasing the computational overhead. We evaluate TRANSCENDENT on a malware dataset spanning 5 years that removes sources of experimental bias present in the original evaluation. TRANSCENDENT outperforms state-of-the-art approaches while generalizing across different malware domains and classifiers. To further assist practitioners, we determine the optimal operational settings for a TRANSCENDENT deployment and show how it can be applied to many popular learning algorithms. These insights support both old and new empirical findings, making Transcend a sound and practical solution for the first time. To this end, we release TRANSCENDENT as open source, to aid the adoption of rejection strategies by the security community.
Are your training datasets yet relevant? - an investigation into the importance of timeline in machine learning-based malware detection
K. Allix, T. F. Bissyande, J. Klein, Y. L. Traon
Published: 2015
AndroZoo: Collecting Millions of Android Apps for the Research Community
K. Allix, T. F. Bissyande, J. Klein, Y. Le Traon
Published: 2016
EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models
Hyrum S. Anderson, Phil Roth
Published: 2018.4.13
Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning
Hyrum S. Anderson, Anant Kharkar, Bobby Filar, David Evans, Phil Roth
Published: 2018.1.27
INSOMNIA: towards concept-drift robustness in network intrusion detection
G. Andresini, F. Pendlebury, F. Pierazzi, C. Loglisci, A. Appice, L. Cavallaro
Published: 2021
Drebin: Effective and explainable detection of android malware in your pocket
D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck
Published: 2014
Dos and Don'ts of Machine Learning in Computer Security
Daniel Arp, Erwin Quiring, Feargus Pendlebury, Alexander Warnecke, Fabio Pierazzi, Christian Wressnegger, Lorenzo Cavallaro, Konrad Rieck
Published: 2020.10.19
Classification with a reject option using a hinge loss
P. L. Bartlett, M. H. Wegkamp
Published: 2008
Adaptive Control Processes - A Guided Tour (Reprint from 1961)
R. Bellman
Published: 2015
Random search for hyper-parameter optimization
J. Bergstra, Y. Bengio
Published: 2012
Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning
Battista Biggio, Fabio Roli
Published: 2017.12.9
Pattern recognition and machine learning, 5th Edition
C. M. Bishop
Published: 2007
Integro: Leveraging victim prediction for robust fake account detection in OSNs
Y. Boshmaf, D. Logothetis, G. Siganos, J. Lería, J. Lorenzo, M. Ripeanu, K. Beznosov
Published: 2015
Aiding the detection of fake accounts in large scale social online services
Q. Cao, M. Sirivianos, X. Yang, T. Pregueiro
Published: 2012
ZOZZLE: fast and precise in-browser javascript malware detection
C. Curtsinger, B. Livshits, B. G. Zorn, C. Seifert
Published: 2011
Droidscribe: Classifying android malware based on runtime behavior
S. K. Dash, G. Suarez-Tangil, S. J. Khan, K. Tam, M. Ahmadi, J. Kinder, L. Cavallaro
Published: 2016
Understanding back-translation at scale
S. Edunov, M. Ott, M. Auli, D. Grangier
Published: 2018
Using noise to compute error surfaces in connectionist networks: A novel means of reducing catastrophic forgetting
R. M. French, N. Chater
Published: 2002
Greedy function approximation: a gradient boosting machine
J. H. Friedman
Published: 2001
Transcend: Detecting concept drift in malware classification models
R. Jordaney, K. Sharad, S. K. Dash, Z. Wang, D. Papini, I. Nouretdinov, L. Cavallaro
Published: 2017
Investigating labelless drift adaptation for malware detection
Z. Kan, F. Pendlebury, F. Pierazzi, L. Cavallaro
Published: 2021
Better malware ground truth: Techniques for weighting anti-virus vendor labels
A. Kantchelian, M. C. Tschantz, S. Afroz, B. Miller, V. Shankar, R. Bachwani, A. D. Joseph, J. D. Tygar
Published: 2015
Imagenet classification with deep convolutional neural networks
A. Krizhevsky, I. Sutskever, G. E. Hinton
Published: 2017
On information and sufficiency
Kullback, S., Leibler, R. A.
Published: 1951
MARVIN: efficient and comprehensive mobile app classification through static and dynamic analysis
M. Lindorfer, M. Neugschwandtner, C. Platzer
Published: 2015
Classification with reject option using conformal prediction
H. Linusson, U. Johansson, H. Bostrom, T. L ofstr om
Published: 2018
Reviewer Integration and Performance Measurement for Malware Detection
Brad Miller, Alex Kantchelian, Michael Carl Tschantz, Sadia Afroz, Rekha Bachwani, Riyaz Faizullabhoy, Ling Huang, Vaishaal Shankar, Tony Wu, George Yiu, Anthony D. Joseph, J. D. Tygar
Published: 2015.10.26
A unifying view on dataset shift in classification
J. G. Moreno-Torres, T. Raeder, R. Aláız-Rodŕıguez, N. V. Chawla, F. Herrera
Published: 2012
Context-aware, Adaptive and Scalable Android Malware Detection through Online Learning (extended version)
Annamalai Narayanan, Mahinthan Chandramohan, Lihui Chen, Yang Liu
Published: 2017.6.3
Context-aware, Adaptive and Scalable Android Malware Detection through Online Learning (extended version)
Annamalai Narayanan, Mahinthan Chandramohan, Lihui Chen, Yang Liu
Published: 2017.6.3
POISED: Spotting Twitter Spam Off the Beaten Paths
Shirin Nilizadeh, Francois Labreche, Alireza Sedighian, Ali Zand, Jose Fernandez, Christopher Kruegel, Gianluca Stringhini, Giovanni Vigna
Published: 2017.8.30
Inductive conformal prediction: Theory and application to neural networks
H. Papadopoulos
Published: 2008
Machine Learning for Security in Hostile Environments
F. Pendlebury
Published: 2021
TESSERACT: eliminating experimental bias in malware classification across space and time
F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder, L. Cavallaro
Published: 2019
Intriguing properties of adversarial ML attacks in the problem space
Fabio Pierazzi, Feargus Pendlebury, Jacopo Cortellazzi, Lorenzo Cavallaro
Published: 2020
The Symposium
Plato
Published: 1999
A tutorial on conformal prediction
G. Shafer, V. Vovk
Published: 2008
Deep neural rejection against adversarial examples
A. Sotgiu, A. Demontis, M. Melis, B. Biggio, G. Fumera, X. Feng, F. Roli
Published: 2020
Detection of malicious PDF files based on hierarchical document structure
N. Srndic, P. Laskov
Published: 2013
Hidost: a static machine-learning-based detector of malicious files
N. Srndic, P. Laskov
Published: 2016
Dendroid: A text mining approach to analyzing and classifying code structures in android malware families
G. Suarez-Tangil, J. E. Tapiador, P. Peris-Lopez, J. B. Alís
Published: 2014
Droidsieve: Fast and accurate classification of obfuscated android malware
G. Suarez-Tangil, S. K. Dash, M. Ahmadi, J. Kinder, G. Giacinto, L. Cavallaro
Published: 2017
Intriguing properties of neural networks
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus
Published: 2014
Improving Robustness of ML Classifiers against Realizable Evasion Attacks Using Conserved Features
Liang Tong, Bo Li, Chen Hajaj, Chaowei Xiao, Ning Zhang, Yevgeniy Vorobeychik
Published: 2017.8.28
Conditional validity of inductive conformal predictors
V. Vovk
Published: 2013
Algorithmic learning in a random world
V. Vovk, A. Gammerman, G. Shafer
Published: 2010
Cross-conformal predictive distributions
V. Vovk, I. Nouretdinov, V. Manokhin, A. Gammerman
Published: 2018
Deepintent: Deep icon-behavior learning for detecting intention-behavior discrepancy in mobile apps
S. Xi, S. Yang, X. Xiao, Y. Yao, Y. Xiong, F. Xu, H. Wang, P. Gao, Z. Liu, F. Xu, J. Lu
Published: 2019
Droidvolver: Self-evolving android malware detection system
K. Xu, Y. Li, R. Deng, K. Chen, J. Xu
Published: 2019
CADE: detecting and explaining concept drift samples for security applications
L. Yang, W. Guo, Q. Hao, A. Ciptadi, A. Ahmadzadeh, X. Xing, G. Wang
Published: 2021
Enhancing State-of-the-Art Classifiers with API Semantics to Detect Evolved Android Malware
X. Zhang, Y. Zhang, M. Zhong, D. Ding, Y. Cao, Y. Zhang, M. Zhang, M. Yang
Published: 2020
Share