CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

TOP Literature Database CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2404.13161

PDF

https://arxiv.org/pdf/2404.13161

Paper Information

Author: Manish Bhatt;Sahana Chennabasappa;Yue Li;Cyrus Nikolaidis;Daniel Song;Shengye Wan;Faizan Ahmad;Cornelius Aschermann;Yaohui Chen;Dhaval Kapil;David Molnar;Spencer Whitman;Joshua Saxe
Published: 4-20-2024
Affiliation: Unknown
Country: Unknown
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Prompt Injection LLM Security Cybersecurity

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large language models (LLMs) introduce new security risks, but there are few comprehensive evaluation suites to measure and reduce these risks. We present BenchmarkName, a novel benchmark to quantify LLM security risks and capabilities. We introduce two new areas for testing: prompt injection and code interpreter abuse. We evaluated multiple state-of-the-art (SOTA) LLMs, including GPT-4, Mistral, Meta Llama 3 70B-Instruct, and Code Llama. Our results show that conditioning away risk of attack remains an unsolved problem; for example, all tested models showed between 26% and 41% successful prompt injection tests. We further introduce the safety-utility tradeoff: conditioning an LLM to reject unsafe prompts can cause the LLM to falsely reject answering benign prompts, which lowers utility. We propose quantifying this tradeoff using False Refusal Rate (FRR). As an illustration, we introduce a novel test set to quantify FRR for cyberattack helpfulness risk. We find many LLMs able to successfully comply with "borderline" benign requests while still rejecting most unsafe requests. Finally, we quantify the utility of LLMs for automating a core cybersecurity task, that of exploiting software vulnerabilities. This is important because the offensive capabilities of LLMs are of intense interest; we quantify this by creating novel test sets for four representative problems. We find that models with coding capabilities perform better than those without, but that further work is needed for LLMs to become proficient at exploit generation. Our code is open source and can be used to evaluate other LLMs.

External Datasets

CyberSecEval's cyberattack helpfulness dataset

False Refusal Rate (FRR) dataset

References

Anil et al. (2024)

Anil

Published: 2024

Breitenbach and Wood (2022)

Breitenbach, Wood

Published: 2022

Bhatt et al. (2023)

Bhatt

Published: 2023

Achiam et al. (2023)

Achiam

Published: 2023

Team et al. (2023)

Team

Published: 2023

OWA (2021)

Published: 2021