Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review | AIセキュリティポータル

EN

JA

EN

TOP 文献データベース Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review

arxiv

Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2603.18740

PDF

https://arxiv.org/pdf/2603.18740

文献情報

作者: Dimitris Mitropoulos,Nikolaos Alexopoulos,Georgios Alexopoulos,Diomidis Spinellis
公開日: 2026-3-19
所属機関: University of Athens
所属の国: Greece
会議名

AIにより推定されたラベル

レビューと調査インダイレクトプロンプトインジェクションプロンプトの検証

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Security code reviews increasingly rely on systems integrating Large Language Models (LLMs), ranging from interactive assistants to autonomous agents in CI/CD pipelines. We study whether confirmation bias (i.e., the tendency to favor interpretations that align with prior expectations) affects LLM-based vulnerability detection, and whether this failure mode can be exploited in software supply-chain attacks. We conduct two complementary studies. Study 1 quantifies confirmation bias through controlled experiments on 250 CVE vulnerability/patch pairs evaluated across four state-of-the-art models under five framing conditions for the review prompt. Framing a change as bug-free reduces vulnerability detection rates by 16-93%, with strongly asymmetric effects: false negatives increase sharply while false positive rates change little. Bias effects vary by vulnerability type, with injection flaws being more susceptible to them than memory corruption bugs. Study 2 evaluates exploitability in practice mimicking adversarial pull requests that reintroduce known vulnerabilities while framed as security improvements or urgent functionality fixes via their pull request metadata. Adversarial framing succeeds in 35% of cases against GitHub Copilot (interactive assistant) under one-shot attacks and in 88% of cases against Claude Code (autonomous agent) in real project configurations where adversaries can iteratively refine their framing to increase attack success. Debiasing via metadata redaction and explicit instructions restores detection in all interactive cases and 94% of autonomous cases. Our results show that confirmation bias poses a weakness in LLM-based code review, with implications on how AI-assisted development tools are deployed.

外部データセット

CrossVuln

GitHub search API dataset

参考文献

2023 IEEE 9th International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE)

Vulnerability detection and monitoring using LLM

Vishwanath Akuthota, Raghunandan Kasula, Sabiha Tasnim Sumona, Masud Mohiuddin, Md Tanzim Reza, Md Mizanur Rahman

Published: 2023

claude-code-action GitHub repository

Published: 2026

Claude Code documentation

Published: 2026

Code review plugin of Claude Code

Published: 2026

CRken: AI-Powered Code Review for GitLab

Published: 2024

2025 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

Rethinking code review workflows with LLM assistance: An empirical study

Fannar Steinn Aðalsteinsson, Björn Borgar Magnússon, Mislav Milicevic, Adam Nirving Davidsson, Chih-Hong Cheng

Published: 2025

IEEE Transactions on Software Engineering

Deep learning based vulnerability detection: Are we there yet?

Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, Baishakhi Ray

Published: 2022

Applied and Computational Engineering

Cognitive biases in large language model based decision making: Insights and mitigation strategies

Siduo Chen

Published: 2025

From yes-men to truth-tellers: Addressing sycophancy in large language models with pinpoint tuning

Wei Chen, Zhen Huang, Liang Xie, Binbin Lin, Houqiang Li, Le Lu, Xinmei Tian, Deng Cai, Yonggang Zhang, Wenxiao Wang, Xu Shen, Jieping Ye

Published: 2025

Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering, FSE Companion ’25

Autoreview: An LLM-based multi-agent system for security issue-oriented code review

Yujia Chen

Published: 2025