These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The increasing trend of using Large Language Models (LLMs) for code
generation raises the question of their capability to generate trustworthy
code. While many researchers are exploring the utility of code generation for
uncovering software vulnerabilities, one crucial but often overlooked aspect is
the security Application Programming Interfaces (APIs). APIs play an integral
role in upholding software security, yet effectively integrating security APIs
presents substantial challenges. This leads to inadvertent misuse by
developers, thereby exposing software to vulnerabilities. To overcome these
challenges, developers may seek assistance from LLMs. In this paper, we
systematically assess ChatGPT's trustworthiness in code generation for security
API use cases in Java. To conduct a thorough evaluation, we compile an
extensive collection of 48 programming tasks for 5 widely used security APIs.
We employ both automated and manual approaches to effectively detect security
API misuse in the code generated by ChatGPT for these tasks. Our findings are
concerning: around 70% of the code instances across 30 attempts per task
contain security API misuse, with 20 distinct misuse types identified.
Moreover, for roughly half of the tasks, this rate reaches 100%, indicating
that there is a long way to go before developers can rely on ChatGPT to
securely implement security API code.