ICL-EVADER: Zero-Query Black-Box Evasion Attacks on In-Context Learning and Their Defenses

Authors: Ningyuan He, Ronghong Huang, Qianqian Tang, Hongyu Wang, Xianghang Mi, Shanqing Guo | Published: 2026-01-29

2026.01.29

Authors: Ningyuan He, Ronghong Huang, Qianqian Tang, Hongyu Wang, Xianghang Mi, Shanqing Guo
Published: 2026-01-29

Source: https://arxiv.org/abs/2601.21586

PDF: https://arxiv.org/pdf/2601.21586

AIにより推定されたラベル

プロンプトリーキングデータ毒性攻撃モデル抽出攻撃

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

In-context learning (ICL) has become a powerful, data-efficient paradigm for text classification using large language models. However, its robustness against realistic adversarial threats remains largely unexplored. We introduce ICL-Evader, a novel black-box evasion attack framework that operates under a highly practical zero-query threat model, requiring no access to model parameters, gradients, or query-based feedback during attack generation. We design three novel attacks, Fake Claim, Template, and Needle-in-a-Haystack, that exploit inherent limitations of LLMs in processing in-context prompts. Evaluated across sentiment analysis, toxicity, and illicit promotion tasks, our attacks significantly degrade classifier performance (e.g., achieving up to 95.3