These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Open set recognition (OSR) problem has been a challenge in many machine
learning (ML) applications, such as security. As new/unknown malware families
occur regularly, it is difficult to exhaust samples that cover all the classes
for the training process in ML systems. An advanced malware classification
system should classify the known classes correctly while sensitive to the
unknown class. In this paper, we introduce a self-supervised pre-training
approach for the OSR problem in malware classification. We propose two
transformations for the function call graph (FCG) based malware representations
to facilitate the pretext task. Also, we present a statistical thresholding
approach to find the optimal threshold for the unknown class. Moreover, the
experiment results indicate that our proposed pre-training process can improve
different performances of different downstream loss functions for the OSR
problem.