攻撃手法

Safety at Scale: A Comprehensive Survey of Large Model Safety

Authors: Xingjun Ma, Yifeng Gao, Yixu Wang, Ruofan Wang, Xin Wang, Ye Sun, Yifan Ding, Hengyuan Xu, Yunhao Chen, Yunhan Zhao, Hanxun Huang, Yige Li, Jiaming Zhang, Xiang Zheng, Yang Bai, Zuxuan Wu, Xipeng Qiu, Jingfeng Zhang, Yiming Li, Xudong Han, Haonan Li, Jun Sun, Cong Wang, Jindong Gu, Baoyuan Wu, Siheng Chen, Tianwei Zhang, Yang Liu, Mingming Gong, Tongliang Liu, Shirui Pan, Cihang Xie, Tianyu Pang, Yinpeng Dong, Ruoxi Jia, Yang Zhang, Shiqing Ma, Xiangyu Zhang, Neil Gong, Chaowei Xiao, Sarah Erfani, Tim Baldwin, Bo Li, Masashi Sugiyama, Dacheng Tao, James Bailey, Yu-Gang Jiang | Published: 2025-02-02 | Updated: 2025-03-19
インダイレクトプロンプトインジェクション
プロンプトインジェクション
攻撃手法

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Authors: Shiji Zhao, Ranjie Duan, Fengxiang Wang, Chi Chen, Caixin Kang, Jialing Tao, YueFeng Chen, Hui Xue, Xingxing Wei | Published: 2025-01-09
テキストシャッフル不整合
プロンプトインジェクション
攻撃手法

AutoDFL: A Scalable and Automated Reputation-Aware Decentralized Federated Learning

Authors: Meryem Malak Dif, Mouhamed Amine Bouchiha, Mourad Rabah, Yacine Ghamri-Doudane | Published: 2025-01-08
プライバシー保護
フレームワーク
攻撃手法

Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models

Authors: Yanjiang Liu, Shuhen Zhou, Yaojie Lu, Huijia Zhu, Weiqiang Wang, Hongyu Lin, Ben He, Xianpei Han, Le Sun | Published: 2025-01-03
フレームワーク
プロンプトインジェクション
攻撃手法

Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs

Authors: Linhao Huang, Xue Jiang, Zhiqiang Wang, Wentao Mo, Xi Xiao, Bo Han, Yongjie Yin, Feng Zheng | Published: 2025-01-02 | Updated: 2025-01-10
攻撃の評価
攻撃手法
敵対的サンプル

Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models

Authors: Ma Teng, Jia Xiaojun, Duan Ranjie, Li Xinfeng, Huang Yihao, Chu Zhixuan, Liu Yang, Ren Wenqi | Published: 2024-12-08 | Updated: 2025-01-03
コンテンツモデレーション
プロンプトインジェクション
攻撃手法

Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians

Authors: William N. Caballero, Matthew LaRosa, Alexander Fisher, Vahid Tarokh | Published: 2024-11-21
攻撃手法
最適化問題

Unmasking the Shadows: Pinpoint the Implementations of Anti-Dynamic Analysis Techniques in Malware Using LLM

Authors: Haizhou Wang, Nanqing Luo, Xusheng Li, Peng LIu | Published: 2024-11-08 | Updated: 2025-04-29
マルウェア進化
攻撃手法
検出手法の分析

Defense Against Prompt Injection Attack by Leveraging Attack Techniques

Authors: Yulin Chen, Haoran Li, Zihao Zheng, Yangqiu Song, Dekai Wu, Bryan Hooi | Published: 2024-11-01 | Updated: 2025-07-22
インダイレクトプロンプトインジェクション
プロンプトインジェクション
攻撃手法

Low-Rank Adversarial PGD Attack

Authors: Dayana Savostianova, Emanuele Zangrando, Francesco Tudisco | Published: 2024-10-16
攻撃手法