These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Today's security tools predominantly rely on predefined rules crafted by
experts, making them poorly adapted to the emergence of software supply chain
attacks. To tackle this limitation, we propose a novel tool, RuleLLM, which
leverages large language models (LLMs) to automate rule generation for OSS
ecosystems. RuleLLM extracts metadata and code snippets from malware as its
input, producing YARA and Semgrep rules that can be directly deployed in
software development. Specifically, the rule generation task involves three
subtasks: crafting rules, refining rules, and aligning rules. To validate
RuleLLM's effectiveness, we implemented a prototype system and conducted
experiments on the dataset of 1,633 malicious packages. The results are
promising that RuleLLM generated 763 rules (452 YARA and 311 Semgrep) with a
precision of 85.2\% and a recall of 91.8\%, outperforming state-of-the-art
(SOTA) tools and scored-based approaches. We further analyzed generated rules
and proposed a rule taxonomy: 11 categories and 38 subcategories.