Symbolic analysis meets federated learning to enhance malware identifier

Labels Predicted by AI
Abstract

Over past years, the manually methods to create detection rules were no longer practical in the anti-malware product since the number of malware threats has been growing. Thus, the turn to the machine learning approaches is a promising way to make the malware recognition more efficient. The traditional centralized machine learning requires a large amount of data to train a model with excellent performance. To boost the malware detection, the training data might be on various kind of data sources such as data on host, network and cloud-based anti-malware components, or even, data from different enterprises. To avoid the expenses of data collection as well as the leakage of private data, we present a federated learning system to identify malwares through the behavioural graphs, i.e., system call dependency graphs. It is based on a deep learning model including a graph autoencoder and a multi-classifier module. This model is trained by a secure learning protocol among clients to preserve the private data against the inference attacks. Using the model to identify malwares, we achieve the accuracy of 85% for the homogeneous graph data and 93% for the inhomogeneous graph data.

Copied title and URL