攻撃手法

Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents

Authors: Juhee Kim, Woohyuk Choi, Byoungyoung Lee | Published: 2025-03-17
インダイレクトプロンプトインジェクション
データ流分析
攻撃手法

BLIA: Detect model memorization in binary classification model through passive Label Inference attack

Authors: Mohammad Wahiduzzaman Khan, Sheng Chen, Ilya Mironov, Leizhen Zhang, Rabib Noor | Published: 2025-03-17
データキュレーション
差分プライバシー
攻撃手法

Winning the MIDST Challenge: New Membership Inference Attacks on Diffusion Models for Tabular Data Synthesis

Authors: Xiaoyu Wu, Yifei Pang, Terrance Liu, Steven Wu | Published: 2025-03-15
データ生成手法
メンバーシップ開示リスク
攻撃手法

Trust Under Siege: Label Spoofing Attacks against Machine Learning for Android Malware Detection

Authors: Tianwei Lan, Luca Demetrio, Farid Nait-Abdesselam, Yufei Han, Simone Aonzo | Published: 2025-03-14
バックドア攻撃
ラベル
攻撃手法

Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search

Authors: Andy Zhou | Published: 2025-03-13 | Updated: 2025-03-16
LLMの安全機構の解除
攻撃手法
生成モデル

Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis

Authors: Jeonghwan Park, Niall McLaughlin, Ihsen Alouani | Published: 2025-03-04 | Updated: 2025-03-16
攻撃手法
敵対的サンプルの検知
深層学習

Can Indirect Prompt Injection Attacks Be Detected and Removed?

Authors: Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, Bryan Hooi | Published: 2025-02-23
プロンプトの検証
悪意のあるプロンプト
攻撃手法

Safety at Scale: A Comprehensive Survey of Large Model Safety

Authors: Xingjun Ma, Yifeng Gao, Yixu Wang, Ruofan Wang, Xin Wang, Ye Sun, Yifan Ding, Hengyuan Xu, Yunhao Chen, Yunhan Zhao, Hanxun Huang, Yige Li, Jiaming Zhang, Xiang Zheng, Yang Bai, Zuxuan Wu, Xipeng Qiu, Jingfeng Zhang, Yiming Li, Xudong Han, Haonan Li, Jun Sun, Cong Wang, Jindong Gu, Baoyuan Wu, Siheng Chen, Tianwei Zhang, Yang Liu, Mingming Gong, Tongliang Liu, Shirui Pan, Cihang Xie, Tianyu Pang, Yinpeng Dong, Ruoxi Jia, Yang Zhang, Shiqing Ma, Xiangyu Zhang, Neil Gong, Chaowei Xiao, Sarah Erfani, Tim Baldwin, Bo Li, Masashi Sugiyama, Dacheng Tao, James Bailey, Yu-Gang Jiang | Published: 2025-02-02 | Updated: 2025-03-19
インダイレクトプロンプトインジェクション
プロンプトインジェクション
攻撃手法

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Authors: Shiji Zhao, Ranjie Duan, Fengxiang Wang, Chi Chen, Caixin Kang, Jialing Tao, YueFeng Chen, Hui Xue, Xingxing Wei | Published: 2025-01-09
テキストシャッフル不整合
プロンプトインジェクション
攻撃手法

AutoDFL: A Scalable and Automated Reputation-Aware Decentralized Federated Learning

Authors: Meryem Malak Dif, Mouhamed Amine Bouchiha, Mourad Rabah, Yacine Ghamri-Doudane | Published: 2025-01-08
プライバシー保護
フレームワーク
攻撃手法