These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
In this paper, we propose a framework for early-stage malware detection and
mitigation by leveraging natural language processing (NLP) techniques and
machine learning algorithms. Our primary contribution is presenting an approach
for predicting the upcoming actions of malware by treating application
programming interface (API) call sequences as natural language inputs and
employing text classification methods, specifically a Bi-LSTM neural network,
to predict the next API call. This enables proactive threat identification and
mitigation, demonstrating the effectiveness of applying NLP principles to API
call sequences. The Bi-LSTM model is evaluated using two datasets. %The model
achieved an accuracy of 93.6\% and 88.8\% for the %first and second dataset
respectively. Additionally, by modeling consecutive API calls as 2-gram and
3-gram strings, we extract new features to be further processed using a
Bagging-XGBoost algorithm, effectively predicting malware presence at its early
stages. The accuracy of the proposed framework is evaluated by simulations.