Usability as a Weapon: Attacking the Safety of LLM-Based Code Generation via Usability Requirements | AIセキュリティポータル

EN

JA

EN

TOP 文献データベース Usability as a Weapon: Attacking the Safety of LLM-Based Code Generation via Usability Requirements

arxiv

Usability as a Weapon: Attacking the Safety of LLM-Based Code Generation via Usability Requirements

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2605.10133

PDF

https://arxiv.org/pdf/2605.10133

文献情報

作者: Yue Li,Xiao Li,Hao Wu,Yue Zhang,Yechao Zhang,Yating Liu,Fengyuan Xu,Sheng Zhong
公開日: 2026-5-11
所属機関: National Key Lab for Novel Software Technology, Nanjing University
所属の国: China
会議名

AIにより推定されたラベル

セキュリティとユーザビリティのトレードオフ LLMの安全機構の解除攻撃の評価

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Large Language Models (LLMs) are increasingly used for automated software development, making their ability to preserve secure coding practices critical. In practice, however, many security requirements are implicit or underspecified, whereas usability requirements are explicit and high-signal. This asymmetry motivates our investigation of usability pressure as a practical attack surface: realistic usability-oriented requirements (e.g., new features, performance constraints, or simplicity demands) can cause coding LLMs to satisfy explicit usability goals while silently dropping implicit security constraints -- a form of reward hacking. We formalize this threat as UPAttack and propose U-SPLOIT, an automated framework to craft UPAttack that (i) selects tasks where a model is initially secure, (ii) synthesizes usability pressures by identifying usability rewards of insecure alternatives across three vectors (Functionality, Implementation, Trade-off), and (iii) verifies security regression via both existing test cases and dynamically generated exploit payloads. Across 75 seed scenarios (25 CWEs x 3 cases), spanning multiple languages (Python, C, and JavaScript), U-SPLOIT achieves attack success rates up to 98.1% on multiple state-of-the-art models (e.g., GPT-5.2-chat and Gemini-3-Flash-Preview).

外部データセット

CWEval

SeCodePLT

参考文献

Published: 2024

Published: 2024

Computing Research Repository (CoRR)

Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models

Manish Bhatt, Sahana Chennabasappa, Cyrus Nikolaidis, Shengye Wan, Ivan Evtimov, Dominik Gabi, Daniel Song, Faizan Ahmad, Cornelius Aschermann, Lorenzo Fontana, Sasha Frolov, Ravi Prakash Giri, Dhaval Kapil, Yiannis Kozyrakis, David LeBlanc, James Milazzo, Aleksandar Straumann, Gabriel Synnaeve, Varun Vontimitta, Spencer Whitman, Joshua Saxe

Published: 2023.12.8

This paper presents CyberSecEval, a comprehensive benchmark developed to help bolster the cybersecurity of Large Language Models (LLMs) employed as coding assistants. As what we believe to be the most extensive unified cybersecurity safety benchmark to date, CyberSecEval provides a thorough evaluation of LLMs in two crucial security domains: their propensity to generate insecure code and their level of compliance when asked to assist in cyberattacks. Through a case study involving seven models from the Llama 2, Code Llama, and OpenAI GPT large language model families, CyberSecEval effectively pinpointed key cybersecurity risks. More importantly, it offered practical insights for refining these models. A significant observation from the study was the tendency of more advanced models to suggest insecure code, highlighting the critical need for integrating security considerations in the development of sophisticated LLMs. CyberSecEval, with its automated test case generation and evaluation pipeline covers a broad scope and equips LLM designers and researchers with a tool to broadly measure and enhance the cybersecurity safety properties of LLMs, contributing to the development of more secure AI systems.

サイバーセキュリティプロンプトインジェクション LLMセキュリティ

Advances in neural information processing systems

Deep reinforcement learning from human preferences

Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei

Published: 2017

Published: 2024

Constrained decoding for secure code generation

Y. Fu, E. Baker, Y. Ding, Y. Chen

Published: 2024

Github copilot agents

Published: 2024

Published: 2024

Instruction tuning for secure code generation

J. He, M. Vero, G. Krasnopolska, M. Vechev

Published: 2024

Qwen2.5 Coder Technical Report

Hui, B., Yang, J., Cui, Z., Yang, J., Liu, D., Zhang, L., Liu, T., Zhang, J., Yu, B., Dang, K., Yang, A., Men, R., Huang, F., Ren, X., Ren, X., Zhou, J., Lin, J.

Published: 2024

Inference-time reward hacking in large language models

H. Khalaf, C. M. Verdun, A. Oesterling, H. Lakkaraju, F. d. P. Calmon

Published: 2025

An exploratory study on fine-tuning large language models for secure code generation

J. Li, F. Rabbi, C. Cheng, A. Sangalay, Y. Tian, J. Yang

Published: 2024

A Systematic Study of Code Obfuscation Against LLM-based Vulnerability Detection

Xiao Li, Yue Li, Hao Wu, Yue Zhang, Yechao Zhang, Fengyuan Xu, Sheng Zhong

Published: 2025.12.18

As large language models (LLMs) are increasingly adopted for code vulnerability detection, their reliability and robustness across diverse vulnerability types have become a pressing concern. In traditional adversarial settings, code obfuscation has long been used as a general strategy to bypass auditing tools, preserving exploitability without tampering with the tools themselves. Numerous efforts have explored obfuscation methods and tools, yet their capabilities differ in terms of supported techniques, granularity, and programming languages, making it difficult to systematically assess their impact on LLM-based vulnerability detection. To address this gap, we provide a structured systematization of obfuscation techniques and evaluate them under a unified framework. Specifically, we categorize existing obfuscation methods into three major classes (layout, data flow, and control flow) covering 11 subcategories and 19 concrete techniques. We implement these techniques across four programming languages (Solidity, C, C++, and Python) using a consistent LLM-driven approach, and evaluate their effects on 15 LLMs spanning four model families (DeepSeek, OpenAI, Qwen, and LLaMA), as well as on two coding agents (GitHub Copilot and Codex). Our findings reveal both positive and negative impacts of code obfuscation on LLM-based vulnerability detection, highlighting conditions under which obfuscation leads to performance improvements or degradations. We further analyze these outcomes with respect to vulnerability characteristics, code properties, and model attributes. Finally, we outline several open problems and propose future directions to enhance the robustness of LLMs for real-world vulnerability detection.

難読化手法インダイレクトプロンプトインジェクションプロンプトインジェクション

IACAPAP

Attention is all you need for llm-based code vulnerability localization

Y. Li, X. Li, H. Wu, Y. Zhang, X. Cheng, S. Zhong, F. Xu

Published: 2024

Published: 2024

Categorizing variants of goodhart’s law

D. Manheim, S. Garrabrant

Published: 2018

MITRE Corporation

Common weakness enumeration: A community-developed list of software and hardware weaknesses

Published: 2025

Model Context Protocol Working Group

Model context protocol specification: stdio transport

Published: 2025

The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track

Secodeplt: A unified benchmark for evaluating the security risks and capabilities of code genai

Nie, Y., Wang, Z., Yang, Y., Jiang, R., Tang, Y., Davies, X., Gal, Y., Li, B., Guo, W., Song, D.

Published: 2025

Feature request analysis and processing: Tasks, techniques, and trends

F. Niu, C. Li, H. Zuo, J. Wu, X. Xia

Published: 2025