In today's digital world most of the anti-malware tools are signature based
which is ineffective to detect advanced unknown malware viz. metamorphic
malware. In this paper, we study the frequency of opcode occurrence to detect
unknown malware by using machine learning technique. For the purpose, we have
used kaggle Microsoft malware classification challenge dataset. The top 20
features obtained from fisher score, information gain, gain ratio, chi-square
and symmetric uncertainty feature selection methods are compared. We also
studied multiple classifier available in WEKA GUI based machine learning tool
and found that five of them (Random Forest, LMT, NBT, J48 Graft and REPTree)
detect malware with almost 100% accuracy.