Android malware have been growing at an exponential pace and becomes a
serious threat to mobile users. It appears that most of the anti-malware still
relies on the signature-based detection system which is generally slow and
often not able to detect advanced obfuscated malware. Hence time-to-time
various authors have proposed different machine learning solutions to identify
sophisticated malware. However, it appears that detection accuracy can be
improved by using the clustering method. Therefore in this paper, we propose a
novel scalable and effective clustering method to improve the detection
accuracy of the malicious android application and obtained a better overall
accuracy (98.34%) by random forest classifier compared to regular method, i.e.,
taking the data altogether to detect the malware. However, as far as true
positive and true negative are concerned, by clustering method, true positive
is best obtained by decision tree (97.59%) and true negative by support vector
machine (99.96%) which is the almost same result obtained by the random forest
true positive (97.30%) and true negative (99.38%) respectively. The reason that
overall accuracy of random forest is high because the true positive of support
vector machine and true negative of the decision tree is significantly less
than the random forest.