Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings

TOP 文献データベース Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2503.15092

PDF

https://arxiv.org/pdf/2503.15092

文献情報

作者: Zonghao Ying,Guangyi Zheng,Yongxin Huang,Deyue Zhang,Wenxin Zhang,Quanchen Zou,Aishan Liu,Xianglong Liu,Dacheng Tao
公開日: 2025-3-19
所属機関: Beihang University
所属の国: China
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

攻撃手法プロンプトインジェクション大規模言語モデル

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

This study presents the first comprehensive safety evaluation of the DeepSeek models, focusing on evaluating the safety risks associated with their generated content. Our evaluation encompasses DeepSeek's latest generation of large language models, multimodal large language models, and text-to-image models, systematically examining their performance regarding unsafe content generation. Notably, we developed a bilingual (Chinese-English) safety evaluation dataset tailored to Chinese sociocultural contexts, enabling a more thorough evaluation of the safety capabilities of Chinese-developed models. Experimental results indicate that despite their strong general capabilities, DeepSeek models exhibit significant safety vulnerabilities across multiple risk dimensions, including algorithmic discrimination and sexual content. These findings provide crucial insights for understanding and improving the safety of large foundation models. Our code is available at https://github.com/NY1024/DeepSeek-Safety-Eval.

外部データセット

CNSafe

CNSafe_RT

SafeBench

MM-SafetyBench

I2P