These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Large language models (LLMs) for automatic code generation have achieved
breakthroughs in several programming tasks. Their advances in competition-level
programming problems have made them an essential pillar of AI-assisted pair
programming, and tools such as GitHub Copilot have emerged as part of the daily
programming workflow used by millions of developers. The training data for
these models is usually collected from the Internet (e.g., from open-source
repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these
vulnerabilities and propagate them during the code generation procedure. While
these models have been extensively assessed for their ability to produce
functionally correct programs, there remains a lack of comprehensive
investigations and benchmarks addressing the security aspects of these models.
In this work, we propose a method to systematically study the security issues
of code language models to assess their susceptibility to generating vulnerable
code. To this end, we introduce the first approach to automatically find
generated code that contains vulnerabilities in black-box code generation
models. To achieve this, we present an approach to approximate inversion of the
black-box code generation models based on few-shot prompting. We evaluate the
effectiveness of our approach by examining code language models in generating
high-risk security weaknesses. Furthermore, we establish a collection of
diverse non-secure prompts for various vulnerability scenarios using our
method. This dataset forms a benchmark for evaluating and comparing the
security weaknesses in code language models.