These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Malware attacks have become significantly more frequent and sophisticated in
recent years. Therefore, malware detection and classification are critical
components of information security. Due to the large amount of malware samples
available, it is essential to categorize malware samples according to their
malicious characteristics. Clustering algorithms are thus becoming more widely
used in computer security to analyze the behavior of malware variants and
discover new malware families. Online clustering algorithms help us to
understand malware behavior and produce a quicker response to new threats. This
paper introduces a novel machine learning-based model for the online clustering
of malicious samples into malware families. Streaming data is divided according
to the clustering decision rule into samples from known and new emerging
malware families. The streaming data is classified using the weighted k-nearest
neighbor classifier into known families, and the online k-means algorithm
clusters the remaining streaming data and achieves a purity of clusters from
90.20% for four clusters to 93.34% for ten clusters. This work is based on
static analysis of portable executable files for the Windows operating system.
Experimental results indicate that the proposed online clustering model can
create high-purity clusters corresponding to malware families. This allows
malware analysts to receive similar malware samples, speeding up their
analysis.