Leveraging Large Language Models to Detect npm Malicious Packages

Authors: Nusrat Zahan, Philipp Burckhardt, Mikola Lysenko, Feross Aboukhadijeh, Laurie Williams | Published: 2024-03-18 | Updated: 2025-01-06

2024.03.182025.04.03

Authors: Nusrat Zahan, Philipp Burckhardt, Mikola Lysenko, Feross Aboukhadijeh, Laurie Williams
Published: 2024-03-18 | Updated: 2025-01-06

Source: https://arxiv.org/abs/2403.12196

PDF: https://arxiv.org/pdf/2403.12196

AIにより推定されたラベル

プロンプトインジェクションマルウェア分類 LLM性能評価

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Existing malicious code detection techniques demand the integration of multiple tools to detect different malware patterns, often suffering from high misclassification rates. Therefore, malicious code detection techniques could be enhanced by adopting advanced, more automated approaches to achieve high accuracy and a low misclassification rate. The goal of this study is to aid security analysts in detecting malicious packages by empirically studying the effectiveness of Large Language Models (LLMs) in detecting malicious code. We present SocketAI, a malicious code review workflow to detect malicious code. To evaluate the effectiveness of SocketAI, we leverage a benchmark dataset of 5,115 npm packages, of which 2,180 packages have malicious code. We conducted a baseline comparison of GPT-3 and GPT-4 models with the state-of-the-art CodeQL static analysis tool, using 39 custom CodeQL rules developed in prior research to detect malicious Javascript code. We also compare the effectiveness of static analysis as a pre-screener with SocketAI workflow, measuring the number of files that need to be analyzed. and the associated costs. Additionally, we performed a qualitative study to understand the types of malicious activities detected or missed by our workflow. Our baseline comparison demonstrates a 16 and 9 respectively. GPT-4 achieves higher accuracy with 99 scores, while GPT-3 offers a more cost-effective balance at 91 94 files requiring LLM analysis by 77.9 and 76.1 of arbitrary code, and suspicious domain categories as the top detected malicious packages.