These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
As currently classical malware detection methods based on signatures fail to
detect new malware, they are not always efficient with new obfuscation
techniques. Besides, new malware is easily created and old malware can be
recoded to produce new one. Therefore, classical Antivirus becomes consistently
less effective in dealing with those new threats. Also malware gets hand
tailored to bypass network security and Antivirus. But as analysts do not have
enough time to dissect suspected malware by hand, automated approaches have
been developed. To cope with the mass of new malware, statistical and machine
learning methods proved to be a good approach classifying programs, especially
when using multiple approaches together to provide a likelihood of software
being malicious. In normal approach, some steps have been taken, mostly by
analyzing the opcodes or mnemonics of disassembly and their distribution. In
this paper, we focus on the control flow change (CFC) itself and finding out if
it is significant to detect malware. In the scope of this work, only relative
control flow changes are contemplated, as these are easier to extract from the
first chosen disassembler library and are within a range of 256 addresses.
These features are analyzed as a raw feature, as n-grams of length 2, 4 and 6
and the even more abstract feature of the occurrences of the n-grams is used.
Statistical methods were used as well as the Naive-Bayes algorithm to find out
if there is significant data in CFC. We also test our approach with real-world
datasets.
External Datasets
malware archive of the University of Bonn
malware database of contagio.blogspot.de
malware database of nothink.org
dataset of 574 goodware samples and 94 malware samples for Naive Bayes Classifier
n-grams of 626 samples of goodware and 95 samples of malware