敵対的攻撃

May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks

Authors: Nishit V. Pandya, Andrey Labunets, Sicun Gao, Earlence Fernandes | Published: 2025-07-10
インダイレクトプロンプトインジェクション
敵対的攻撃
防御手法

BarkBeetle: Stealing Decision Tree Models with Fault Injection

Authors: Qifan Wang, Jonas Sander, Minmin Jiang, Thomas Eisenbarth, David Oswald | Published: 2025-07-09
モデル抽出攻撃
敵対的攻撃
特徴選択手法

CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations

Authors: Xiaohu Li, Yunfeng Ning, Zepeng Bao, Mayi Xu, Jianhao Chen, Tieyun Qian | Published: 2025-07-08
プロンプトインジェクション
敵対的攻撃
防御効果分析

The Hidden Threat in Plain Text: Attacking RAG Data Loaders

Authors: Alberto Castagnaro, Umberto Salviati, Mauro Conti, Luca Pajola, Simeone Pizzi | Published: 2025-07-07
RAGへのポイズニング攻撃
大規模言語モデル
敵対的攻撃

Amplifying Machine Learning Attacks Through Strategic Compositions

Authors: Yugeng Liu, Zheng Li, Hai Huang, Michael Backes, Yang Zhang | Published: 2025-06-23
メンバーシップ開示リスク
モデルの頑健性保証
敵対的攻撃

LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge

Authors: Songze Li, Chuokun Xu, Jiaying Wang, Xueluan Gong, Chen Chen, Jirui Zhang, Jun Wang, Kwok-Yan Lam, Shouling Ji | Published: 2025-06-11
LLMの安全機構の解除
プロンプトインジェクション
敵対的攻撃

A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning

Authors: Greg Gluch, Shafi Goldwasser | Published: 2025-04-28 | Updated: 2025-07-10
モデルの頑健性保証
敵対的攻撃
計算問題

Support is All You Need for Certified VAE Training

Authors: Changming Xu, Debangshu Banerjee, Deepak Vasisht, Gagandeep Singh | Published: 2025-04-16
学習の改善
敵対的攻撃
透かし設計

Language Models May Verbatim Complete Text They Were Not Explicitly Trained On

Authors: Ken Ziyu Liu, Christopher A. Choquette-Choo, Matthew Jagielski, Peter Kairouz, Sanmi Koyejo, Percy Liang, Nicolas Papernot | Published: 2025-03-21 | Updated: 2025-03-25
RAG
メンバーシップ開示リスク
敵対的攻撃

Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

Authors: Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh, Tianrui Guan, Mengdi Wang, Ahmad Beirami, Furong Huang, Alvaro Velasquez, Dinesh Manocha, Amrit Singh Bedi | Published: 2024-11-27 | Updated: 2025-03-20
プロンプトインジェクション
安全性アライメント
敵対的攻撃