Paper Information
- Author
- Fangzhou Wu;Qingzhao Zhang;Ati Priya Bajaj;Tiffany Bao;Ning Zhang;Ruoyu "Fish" Wang;Chaowei Xiao
- Published
- 12-8-2023
- Affiliation
- University of Wisconsin, Madison
- Country
- United States of America
- Conference
- Computing Research Repository (CoRR)
Abstract
Large language models (LLMs) have undergone rapid evolution and achieved
remarkable results in recent times. OpenAI's ChatGPT, backed by GPT-3.5 or
GPT-4, has gained instant popularity due to its strong capability across a wide
range of tasks, including natural language tasks, coding, mathematics, and
engaging conversations. However, the impacts and limits of such LLMs in system
security domain are less explored. In this paper, we delve into the limits of
LLMs (i.e., ChatGPT) in seven software security applications including
vulnerability detection/repair, debugging, debloating, decompilation, patching,
root cause analysis, symbolic execution, and fuzzing. Our exploration reveals
that ChatGPT not only excels at generating code, which is the conventional
application of language models, but also demonstrates strong capability in
understanding user-provided commands in natural languages, reasoning about
control and data flows within programs, generating complex data structures, and
even decompiling assembly code. Notably, GPT-4 showcases significant
improvements over GPT-3.5 in most security tasks. Also, certain limitations of
ChatGPT in security-related tasks are identified, such as its constrained
ability to process long code contexts.