S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models

TOP Literature Database S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2405.14191

PDF

https://arxiv.org/pdf/2405.14191

Paper Information

Author: Xiaohan Yuan,Jinfeng Li,Dongxia Wang,Yuefeng Chen,Xiaofeng Mao,Longtao Huang,Jialuo Chen,Hui Xue,Xiaoxia Liu,Wenhai Wang,Kui Ren,Jingyi Wang
Published: 5-23-2024
Updated: 4-7-2025
Affiliation: Zhejiang University
Country: China
Conference: Proc. ACM Softw. Eng.

Labels Estimated by AI

Large Language Model Safety Alignment Risk Analysis Method

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Generative large language models (LLMs) have revolutionized natural language processing with their transformative and emergent capabilities. However, recent evidence indicates that LLMs can produce harmful content that violates social norms, raising significant concerns regarding the safety and ethical ramifications of deploying these advanced models. Thus, it is both critical and imperative to perform a rigorous and comprehensive safety evaluation of LLMs before deployment. Despite this need, owing to the extensiveness of LLM generation space, it still lacks a unified and standardized risk taxonomy to systematically reflect the LLM content safety, as well as automated safety assessment techniques to explore the potential risk efficiently. To bridge the striking gap, we propose S-Eval, a novel LLM-based automated Safety Evaluation framework with a newly defined comprehensive risk taxonomy. S-Eval incorporates two key components, i.e., an expert testing LLM ${M}_t$ and a novel safety critique LLM ${M}_c$. ${M}_t$ is responsible for automatically generating test cases in accordance with the proposed risk taxonomy. ${M}_c$ can provide quantitative and explainable safety evaluations for better risk awareness of LLMs. In contrast to prior works, S-Eval is efficient and effective in test generation and safety evaluation. Moreover, S-Eval can be flexibly configured and adapted to the rapid evolution of LLMs and accompanying new safety threats, test generation methods and safety critique methods thanks to the LLM-based architecture. S-Eval has been deployed in our industrial partner for the automated safety evaluation of multiple LLMs serving millions of users, demonstrating its effectiveness in real-world scenarios. Our benchmark is publicly available at https://github.com/IS2Lab/S-Eval.

External Datasets

220,000 high-quality test cases

100,000 QA pairs derived from 10,000 risk queries