LLbezpeky: Leveraging Large Language Models for Vulnerability Detection

TOP Literature Database LLbezpeky: Leveraging Large Language Models for Vulnerability Detection

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2401.01269

PDF

https://arxiv.org/pdf/2401.01269

Paper Information

Author: Noble Saji Mathews;Yelizaveta Brus;Yousra Aafer;Meiyappan Nagappan;Shane McIntosh
Published: 1-3-2024
Updated: 2-14-2024
Affiliation: University of Waterloo
Country: Canada
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Vulnerability Management Prompt Injection LLM Performance Evaluation

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Despite the continued research and progress in building secure systems, Android applications continue to be ridden with vulnerabilities, necessitating effective detection methods. Current strategies involving static and dynamic analysis tools come with limitations like overwhelming number of false positives and limited scope of analysis which make either difficult to adopt. Over the past years, machine learning based approaches have been extensively explored for vulnerability detection, but its real-world applicability is constrained by data requirements and feature engineering challenges. Large Language Models (LLMs), with their vast parameters, have shown tremendous potential in understanding semnatics in human as well as programming languages. We dive into the efficacy of LLMs for detecting vulnerabilities in the context of Android security. We focus on building an AI-driven workflow to assist developers in identifying and rectifying vulnerabilities. Our experiments show that LLMs outperform our expectations in finding issues within applications correctly flagging insecure apps in 91.67% of cases in the Ghera benchmark. We use inferences from our experiments towards building a robust and actionable vulnerability detection system and demonstrate its effectiveness. Our experiments also shed light on how different various simple configurations can affect the True Positive (TP) and False Positive (FP) rates.

External Datasets

Ghera