The impressive growth of smartphone devices in combination with the rising
ubiquity of using mobile platforms for sensitive applications such as Internet
banking, have triggered a rapid increase in mobile malware. In recent
literature, many studies examine Machine Learning techniques, as the most
promising approach for mobile malware detection, without however quantifying
the uncertainty involved in their detections. In this paper, we address this
problem by proposing a machine learning dynamic analysis approach that provides
provably valid confidence guarantees in each malware detection. Moreover the
particular guarantees hold for both the malicious and benign classes
independently and are unaffected by any bias in the data. The proposed approach
is based on a novel machine learning framework, called Conformal Prediction,
combined with a random forests classifier. We examine its performance on a
large-scale dataset collected by installing 1866 malicious and 4816 benign
applications on a real android device. We make this collection of dynamic
analysis data available to the research community. The obtained experimental
results demonstrate the empirical validity, usefulness and unbiased nature of
the outputs produced by the proposed approach.
外部データセット
large-scale dataset
dataset with state measurements of a real Android device containing 6682 applications