CodeSentinel: A Three-Layer Defense Against Indirect Prompt Injection in Code Contexts

TOP Literature Database CodeSentinel: A Three-Layer Defense Against Indirect Prompt Injection in Code Contexts

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2606.19235

PDF

https://arxiv.org/pdf/2606.19235

Paper Information

Author: Po-Han Cheng,Chia-Mu Yu,Ying-Dar Lin,Yu-Sung Wu,Wei-Bin Lee
Published: 6-18-2026
Affiliation: National Yang Ming Chiao Tung University
Country: Taiwan
Conference

Labels Estimated by AI

Prompt validation Adversarial attack 脆弱性検出手法(Fail to translate)

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Code large language models increasingly retrieve external code context from repositories, documentation, issue threads, and coding-agent environments, creating an indirect prompt-injection surface where attackers hide instructions in comments, strings, identifiers, or decoy code. We propose CodeSentinel, a three-layer inference-time sanitizer. It uses Tree-sitter to extract high-risk model-facing CST nodes, then combines syntax-guided pre-filtering, CST-guided Dynamic Min-K\% scoring, and node perturbation analysis to detect adversarial and natural-looking semantic triggers. Detected nodes are removed or neutralized before reaching the downstream Code LLM. Across six recent attack families, \CodeSentinel achieves 0.80 average node-level F1, outperforming CodeGarrison, DePA, and KillBadCode.

External Datasets

XOXO

ITGen

Flashboom

ShadowCode

INSEC

CoTDeceptor