攻撃の評価

SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models

Authors: Boyang Zhang, Zheng Li, Ziqing Yang, Xinlei He, Michael Backes, Mario Fritz, Yang Zhang | Published: 2023-10-19
メンバーシップ推論
モデル抽出攻撃
攻撃の評価

Attack Prompt Generation for Red Teaming and Defending Large Language Models

Authors: Boyi Deng, Wenjie Wang, Fuli Feng, Yang Deng, Qifan Wang, Xiangnan He | Published: 2023-10-19
プロンプトインジェクション
攻撃の評価
敵対的サンプル

Last One Standing: A Comparative Analysis of Security and Privacy of Soft Prompt Tuning, LoRA, and In-Context Learning

Authors: Rui Wen, Tianhao Wang, Michael Backes, Yang Zhang, Ahmed Salem | Published: 2023-10-17
プライバシー手法
モデル抽出攻撃
攻撃の評価

BufferSearch: Generating Black-Box Adversarial Texts With Lower Queries

Authors: Wenjie Lv, Zhen Wang, Yitao Zheng, Zhehua Zhong, Qi Xuan, Tianyi Chen | Published: 2023-10-14
攻撃の評価
敵対的サンプル
最適化手法

On the Feasibility of Cross-Language Detection of Malicious Packages in npm and PyPI

Authors: Piergiorgio Ladisa, Serena Elisa Ponta, Nicola Ronzoni, Matias Martinez, Olivier Barais | Published: 2023-10-14
悪意のあるパッケージ検出
攻撃の評価
特徴選択手法

Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation

Authors: Yangsibo Huang, Samyak Gupta, Mengzhou Xia, Kai Li, Danqi Chen | Published: 2023-10-10
プロンプトインジェクション
攻撃の評価
敵対的攻撃

Test-Time Poisoning Attacks Against Test-Time Adaptation Models

Authors: Tianshuo Cong, Xinlei He, Yun Shen, Yang Zhang | Published: 2023-08-16
ポイズニング
モデル性能評価
攻撃の評価

Diff-CAPTCHA: An Image-based CAPTCHA with Security Enhanced by Denoising Diffusion Model

Authors: Ran Jiang, Sanfeng Zhang, Linfeng Liu, Yanbing Peng | Published: 2023-08-16
セキュリティ保証
攻撃の評価
透かしの耐久性

Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots

Authors: Bocheng Chen, Guangjing Wang, Hanqing Guo, Yuanda Wang, Qiben Yan | Published: 2023-07-14
プロンプトインジェクション
対話システム
攻撃の評価

Group-based Robustness: A General Framework for Customized Robustness in the Real World

Authors: Weiran Lin, Keane Lucas, Neo Eyal, Lujo Bauer, Michael K. Reiter, Mahmood Sharif | Published: 2023-06-29 | Updated: 2024-03-10
グループベースの堅牢性
攻撃の評価
敵対的攻撃検出