One of the pivotal security threats for the embedded computing systems is
malicious software a.k.a malware. With efficiency and efficacy, Machine
Learning (ML) has been widely adopted for malware detection in recent times.
Despite being efficient, the existing techniques require a tremendous number of
benign and malware samples for training and modeling an efficient malware
detector. Furthermore, such constraints limit the detection of emerging malware
samples due to the lack of sufficient malware samples required for efficient
training. To address such concerns, we introduce a code-aware data generation
technique that generates multiple mutated samples of the limitedly seen malware
by the devices. Loss minimization ensures that the generated samples closely
mimic the limitedly seen malware and mitigate the impractical samples. Such
developed malware is further incorporated into the training set to formulate
the model that can efficiently detect the emerging malware despite having
limited exposure. The experimental results demonstrates that the proposed
technique achieves an accuracy of 90% in detecting limitedly seen malware,
which is approximately 3x more than the accuracy attained by state-of-the-art
techniques.
外部データセット
VirusTotal dataset
benign application files
random obfuscated malware
stealthy malware
参考文献
The 1st International Conference on Information Technology, Computer, and Electrical Engineering
Challenges and opportunities in designing internet of things