AIセキュリティポータルbot

LLM-PBE: Assessing Data Privacy in Large Language Models

Authors: Qinbin Li, Junyuan Hong, Chulin Xie, Jeffrey Tan, Rachel Xin, Junyi Hou, Xavier Yin, Zhun Wang, Dan Hendrycks, Zhangyang Wang, Bo Li, Bingsheng He, Dawn Song | Published: 2024-08-23 | Updated: 2024-09-06
LLMセキュリティ
プライバシー保護手法
プロンプトインジェクション

Efficient Detection of Toxic Prompts in Large Language Models

Authors: Yi Liu, Junzhe Yu, Huijia Sun, Ling Shi, Gelei Deng, Yuqi Chen, Yang Liu | Published: 2024-08-21 | Updated: 2024-09-14
コンテンツモデレーション
プロンプトインジェクション
モデル性能評価

Private Counting of Distinct Elements in the Turnstile Model and Extensions

Authors: Monika Henzinger, A. R. Sricharan, Teresa Anna Steiner | Published: 2024-08-21
アルゴリズム
プライバシー保護手法

Large Language Models are Good Attackers: Efficient and Stealthy Textual Backdoor Attacks

Authors: Ziqiang Li, Yueqi Zeng, Pengfei Xia, Lei Liu, Zhangjie Fu, Bin Li | Published: 2024-08-21
バックドア攻撃
ポイズニング

EEG-Defender: Defending against Jailbreak through Early Exit Generation of Large Language Models

Authors: Chongwen Zhao, Zhihao Dou, Kaizhu Huang | Published: 2024-08-21
LLMセキュリティ
プロンプトインジェクション
防御手法

Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Carrier Articles

Authors: Zhilong Wang, Haizhou Wang, Nanqing Luo, Lan Zhang, Xiaoyan Sun, Yebo Cao, Peng Liu | Published: 2024-08-20 | Updated: 2025-02-07
プロンプトインジェクション
大規模言語モデル
攻撃シナリオ分析

Security Attacks on LLM-based Code Completion Tools

Authors: Wen Cheng, Ke Sun, Xinyu Zhang, Wei Wang | Published: 2024-08-20 | Updated: 2025-01-02
LLMセキュリティ
プロンプトインジェクション
攻撃手法

Robust Image Classification: Defensive Strategies against FGSM and PGD Adversarial Attacks

Authors: Hetvi Waghela, Jaydip Sen, Sneha Rakshit | Published: 2024-08-20
ポイズニング
敵対的サンプル
防御手法

LeCov: Multi-level Testing Criteria for Large Language Models

Authors: Xuan Xie, Jiayang Song, Yuheng Huang, Da Song, Fuyuan Zhang, Felix Juefei-Xu, Lei Ma | Published: 2024-08-20
LLM性能評価
テスト優先順位付け
プロンプトインジェクション

Tracing Privacy Leakage of Language Models to Training Data via Adjusted Influence Functions

Authors: Jinxin Liu, Zao Yang | Published: 2024-08-20 | Updated: 2024-09-05
LLM性能評価
プライバシー保護手法
評価手法