Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs

TOP Literature Database Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2411.18216

PDF

https://arxiv.org/pdf/2411.18216

Paper Information

Author: Samuele Pasini,Jinhan Kim,Tommaso Aiello,Rocio Cabrera Lozoya,Antonino Sabetta,Paolo Tonella
Published: 11-27-2024
Updated: 9-17-2025
Affiliation: Università della Svizzera italiana
Country: Switzerland
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

RAG Evaluation Method Poisoning attack on RAG

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large Language Models (LLMs) are increasingly used in software development to generate functions, such as attack detectors, that implement security requirements. A key challenge is ensuring the LLMs have enough knowledge to address specific security requirements, such as information about existing attacks. For this, we propose an approach integrating Retrieval Augmented Generation (RAG) and Self-Ranking into the LLM pipeline. RAG enhances the robustness of the output by incorporating external knowledge sources, while the Self-Ranking technique, inspired by the concept of Self-Consistency, generates multiple reasoning paths and creates ranks to select the most robust detector. Our extensive empirical study targets code generated by LLMs to detect two prevalent injection attacks in web security: Cross-Site Scripting (XSS) and SQL injection (SQLi). Results show a significant improvement in detection performance while employing RAG and Self-Ranking, with an increase of up to 71%pt (on average 37%pt) and up to 43%pt (on average 6%pt) in the F2-Score for XSS and SQLi detection, respectively.

External Datasets

Malicious and Benign payloads of HTTP requests from the FMereani repository

dataset presented in the SOFIA paper