ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

TOP 文献データベース ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

Computing Research Repository (CoRR)

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2506.10960

PDF

https://arxiv.org/pdf/2506.10960

文献情報

作者: Kangwei Liu,Siyuan Cheng,Bozhong Tian,Xiaozhuan Liang,Yuyang Yin,Meng Han,Ningyu Zhang,Bryan Hooi,Xi Chen,Shumin Deng
公開日: 2025-6-14
所属機関: Zhejiang University
所属の国: China
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

出力の有害度の算出データ収集手法プロンプトリーキング

Abstract

Large language models (LLMs) have been increasingly applied to automated harmful content detection tasks, assisting moderators in identifying policy violations and improving the overall efficiency and accuracy of content review. However, existing resources for harmful content detection are predominantly focused on English, with Chinese datasets remaining scarce and often limited in scope. We present a comprehensive, professionally annotated benchmark for Chinese content harm detection, which covers six representative categories and is constructed entirely from real-world data. Our annotation process further yields a knowledge rule base that provides explicit expert knowledge to assist LLMs in Chinese harmful content detection. In addition, we propose a knowledge-augmented baseline that integrates both human-annotated knowledge rules and implicit knowledge from large language models, enabling smaller models to achieve performance comparable to state-of-the-art LLMs. Code and data are available at https://github.com/zjunlp/ChineseHarm-bench.