RuleForge: Automated Generation and Validation for Web Vulnerability Detection at Scale

Authors: Ayush Garg, Sophia Hager, Jacob Montiel, Aditya Tiwari, Michael Gentile, Zach Reavis, David Magnotti, Wayne Fullen | Published: 2026-04-02

2026.04.022026.04.04

Authors: Ayush Garg, Sophia Hager, Jacob Montiel, Aditya Tiwari, Michael Gentile, Zach Reavis, David Magnotti, Wayne Fullen
Published: 2026-04-02

Source: https://arxiv.org/abs/2604.01977

PDF: https://arxiv.org/pdf/2604.01977

Labels Predicted by AI

LLM Performance Evaluation

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Security teams face a challenge: the volume of newly disclosed Common Vulnerabilities and Exposures (CVEs) far exceeds the capacity to manually develop detection mechanisms. In 2025, the National Vulnerability Database published over 48,000 new vulnerabilities, motivating the need for automation. We present RuleForge, an AWS internal system that automatically generates detection rules–JSON-based patterns that identify malicious HTTP requests exploiting specific vulnerabilities–from structured Nuclei templates describing CVE details. Nuclei templates provide standardized, YAML-based vulnerability descriptions that serve as the structured input for our rule generation process. This paper focuses on RuleForge’s architecture and operational deployment for CVE-related threat detection, with particular emphasis on our novel LLM-as-a-judge (Large Language Model as judge) confidence validation system and systematic feedback integration mechanism. This validation approach evaluates candidate rules across two dimensions–sensitivity (avoiding false negatives) and specificity (avoiding false positives)–achieving AUROC of 0.75 and reducing false positives by 67