Detection of security smells in IaC scripts through semantics-aware code and language processing

TOP 文献データベース Detection of security smells in IaC scripts through semantics-aware code and language processing

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2509.18790

PDF

https://arxiv.org/pdf/2509.18790

文献情報

作者: Aicha War,Adnan A. Rawass,Abdoul K. Kabore,Jordan Samhi,Jacques Klein,Tegawende F. Bissyande
公開日: 2025-9-25
所属機関: UNIVERSITY OF LUXEMBOURG
所属の国: LUXEMBOURG
会議名

AIにより推定されたラベル

セキュリティ分析プロンプトの検証コード表現技術

Abstract

Infrastructure as Code (IaC) automates the provisioning and management of IT infrastructure through scripts and tools, streamlining software deployment. Prior studies have shown that IaC scripts often contain recurring security misconfigurations, and several detection and mitigation approaches have been proposed. Most of these rely on static analysis, using statistical code representations or Machine Learning (ML) classifiers to distinguish insecure configurations from safe code. In this work, we introduce a novel approach that enhances static analysis with semantic understanding by jointly leveraging natural language and code representations. Our method builds on two complementary ML models: CodeBERT, to capture semantics across code and text, and LongFormer, to represent long IaC scripts without losing contextual information. We evaluate our approach on misconfiguration datasets from two widely used IaC tools, Ansible and Puppet. To validate its effectiveness, we conduct two ablation studies (removing code text from the natural language input and truncating scripts to reduce context) and compare against four large language models (LLMs) and prior work. Results show that semantic enrichment substantially improves detection, raising precision and recall from 0.46 and 0.79 to 0.92 and 0.88 on Ansible, and from 0.55 and 0.97 to 0.87 and 0.75 on Puppet, respectively.