Learning from Few Samples: A Novel Approach for High-Quality Malcode Generation

Authors: Haijian Ma, Daizong Liu, Xiaowen Cai, Pan Zhou, Yulai Xie | Published: 2025-08-25

2025.08.252025.08.27

Authors: Haijian Ma, Daizong Liu, Xiaowen Cai, Pan Zhou, Yulai Xie
Published: 2025-08-25

Source: https://arxiv.org/abs/2508.18148

PDF: https://arxiv.org/pdf/2508.18148

Labels Predicted by AI

Data Generation Method Training Method Watermark

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Intrusion Detection Systems (IDS) play a crucial role in network security defense. However, a significant challenge for IDS in training detection models is the shortage of adequately labeled malicious samples. To address these issues, this paper introduces a novel semi-supervised framework GANGRL-LLM, which integrates Generative Adversarial Networks (GANs) with Large Language Models (LLMs) to enhance malicious code generation and SQL Injection (SQLi) detection capabilities in few-sample learning scenarios. Specifically, our framework adopts a collaborative training paradigm where: (1) the GAN-based discriminator improves malicious pattern recognition through adversarial learning with generated samples and limited real samples; and (2) the LLM-based generator refines the quality of malicious code synthesis using reward signals from the discriminator. The experimental results demonstrate that even with a limited number of labeled samples, our training framework is highly effective in enhancing both malicious code generation and detection capabilities. This dual enhancement capability offers a promising solution for developing adaptive defense systems capable of countering evolving cyber threats.