防御手法

MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models

Authors: Xueqi Cheng, Minxing Zheng, Shixiang Zhu, Yushun Dong | Published: 2025-06-03
モデル抽出攻撃
モデル抽出攻撃の検知
防御手法

DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing

Authors: Yi Wang, Fenghua Weng, Sibei Yang, Zhan Qin, Minlie Huang, Wenjie Wang | Published: 2025-02-17 | Updated: 2025-05-29
LLMセキュリティ
プロンプトインジェクション
防御手法

BADTV: Unveiling Backdoor Threats in Third-Party Task Vectors

Authors: Chia-Yi Hsu, Yu-Lin Tsai, Yu Zhe, Yan-Lun Chen, Chih-Hsun Lin, Chia-Mu Yu, Yang Zhang, Chun-Ying Huang, Jun Sakuma | Published: 2025-01-04
バックドア攻撃
防御手法

Safeguarding System Prompts for LLMs

Authors: Zhifeng Jiang, Zhihua Jin, Guoliang He | Published: 2024-12-18 | Updated: 2025-01-09
LLM性能評価
プロンプトインジェクション
防御手法

Optimal Defenses Against Gradient Reconstruction Attacks

Authors: Yuxiao Chen, Gamze Gürsoy, Qi Lei | Published: 2024-11-06
ポイズニング
防御手法

Resilience in Knowledge Graph Embeddings

Authors: Arnab Sharma, N'Dah Jean Kouagou, Axel-Cyrille Ngonga Ngomo | Published: 2024-10-28
メンバーシップ推論
防御手法

Time Traveling to Defend Against Adversarial Example Attacks in Image Classification

Authors: Anthony Etim, Jakub Szefer | Published: 2024-10-10
攻撃手法
敵対的サンプル
防御手法

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

Authors: Donghyun Lee, Mo Tiwari | Published: 2024-10-09
プロンプトインジェクション
攻撃手法
防御手法

SecAlign: Defending Against Prompt Injection with Preference Optimization

Authors: Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, Chuan Guo | Published: 2024-10-07 | Updated: 2025-01-13
LLMセキュリティ
プロンプトインジェクション
防御手法

SoK: Towards Security and Safety of Edge AI

Authors: Tatjana Wingarz, Anne Lauscher, Janick Edinger, Dominik Kaaser, Stefan Schulte, Mathias Fischer | Published: 2024-10-07
バイアス
プライバシー保護
防御手法