We have identified a methodological problem in the empirical evaluation of
the string encryption detection capabilities of the AndrODet system described
by Mirzaei et al. in the recent paper "AndrODet: An adaptive Android
obfuscation detector". The accuracy of string encryption detection is evaluated
using samples from the AMD and PraGuard malware datasets. However, the authors
failed to account for the fact that many of the AMD samples are highly similar
due to the fact that they come from the same malware family. This introduces a
risk that a machine learning system trained on these samples could fail to
learn a generalizable model for string encryption detection, and might instead
learn to classify samples based on characteristics of each malware family. Our
own evaluation strongly indicates that the reported high accuracy of AndrODet's
string encryption detection is indeed due to this phenomenon. When we evaluated
AndrODet, we found that when we ensured that samples from the same family never
appeared in both training and testing data, the accuracy dropped to around 50%.
Moreover, the PraGuard dataset is not suitable for evaluating a static string
encryption detector such as AndrODet, since the particular obfuscation tool
used to produce the dataset effectively makes it impossible to extract
meaningful features of static strings in Android apps.