Control Flow Change in Assembly as a Classifier in Malware Analysis

TOP Literature Database Control Flow Change in Assembly as a Classifier in Malware Analysis

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/1609.02947

PDF

https://arxiv.org/pdf/1609.02947

Paper Information

Author: Andree Linke,Nhien-An Le-Khac
Published: 9-10-2016
Affiliation: School of Computer Science, University College Dublin
Country: Ireland
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Bayesian Classification Data Extraction and Analysis Feature Selection Method

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

As currently classical malware detection methods based on signatures fail to detect new malware, they are not always efficient with new obfuscation techniques. Besides, new malware is easily created and old malware can be recoded to produce new one. Therefore, classical Antivirus becomes consistently less effective in dealing with those new threats. Also malware gets hand tailored to bypass network security and Antivirus. But as analysts do not have enough time to dissect suspected malware by hand, automated approaches have been developed. To cope with the mass of new malware, statistical and machine learning methods proved to be a good approach classifying programs, especially when using multiple approaches together to provide a likelihood of software being malicious. In normal approach, some steps have been taken, mostly by analyzing the opcodes or mnemonics of disassembly and their distribution. In this paper, we focus on the control flow change (CFC) itself and finding out if it is significant to detect malware. In the scope of this work, only relative control flow changes are contemplated, as these are easier to extract from the first chosen disassembler library and are within a range of 256 addresses. These features are analyzed as a raw feature, as n-grams of length 2, 4 and 6 and the even more abstract feature of the occurrences of the n-grams is used. Statistical methods were used as well as the Naive-Bayes algorithm to find out if there is significant data in CFC. We also test our approach with real-world datasets.

External Datasets

malware archive of the University of Bonn

malware database of contagio.blogspot.de

malware database of nothink.org

dataset of 574 goodware samples and 94 malware samples for Naive Bayes Classifier

n-grams of 626 samples of goodware and 95 samples of malware