These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Machine learning methods can detect Android malware with very high accuracy.
However, these classifiers have an Achilles heel, concept drift: they rapidly
become out of date and ineffective, due to the evolution of malware apps and
benign apps. Our research finds that, after training an Android malware
classifier on one year's worth of data, the F1 score quickly dropped from 0.99
to 0.76 after 6 months of deployment on new test samples.
In this paper, we propose new methods to combat the concept drift problem of
Android malware classifiers. Since machine learning technique needs to be
continuously deployed, we use active learning: we select new samples for
analysts to label, and then add the labeled samples to the training set to
retrain the classifier. Our key idea is, similarity-based uncertainty is more
robust against concept drift. Therefore, we combine contrastive learning with
active learning. We propose a new hierarchical contrastive learning scheme, and
a new sample selection technique to continuously train the Android malware
classifier. Our evaluation shows that this leads to significant improvements,
compared to previously published methods for active learning. Our approach
reduces the false negative rate from 14% (for the best baseline) to 9%, while
also reducing the false positive rate (from 0.86% to 0.48%). Also, our approach
maintains more consistent performance across a seven-year time period than past
methods.