敵対的攻撃検出

HoneyTrap: Deceiving Large Language Model Attackers to Honeypot Traps with Resilient Multi-Agent Defense

Authors: Siyuan Li, Xi Lin, Jun Wu, Zehao Liu, Haoyu Li, Tianjie Ju, Xiang Chen, Jianhua Li | Published: 2026-01-07
プロンプトインジェクション
大規模言語モデル
敵対的攻撃検出

Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense

Authors: Zejian Chen, Chaozhuo Li, Chao Li, Xi Zhang, Litian Zhang, Yiming He | Published: 2026-01-07
プロンプトインジェクション
大規模言語モデル
敵対的攻撃検出

JPU: Bridging Jailbreak Defense and Unlearning via On-Policy Path Rectification

Authors: Xi Wang, Songlei Jian, Shasha Li, Xiaopeng Li, Zhaoye Li, Bin Ji, Baosheng Wang, Jie Yu | Published: 2026-01-06
プロンプトインジェクション
モデル抽出攻撃
敵対的攻撃検出

Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking

Authors: Yifan Huang, Xiaojun Jia, Wenbo Guo, Yuqiang Sun, Yihao Huang, Chong Wang, Yang Liu | Published: 2025-12-24
データ選択戦略
プロンプトインジェクション
敵対的攻撃検出

Unsourced Adversarial CAPTCHA: A Bi-Phase Adversarial CAPTCHA Framework

Authors: Xia Du, Xiaoyuan Liu, Jizhe Zhou, Zheng Lin, Chi-man Pun, Zhe Chen, Wei Ni, Jun Luo | Published: 2025-06-12
モデルの頑健性保証
敵対的学習
敵対的攻撃検出

Let the Noise Speak: Harnessing Noise for a Unified Defense Against Adversarial and Backdoor Attacks

Authors: Md Hasan Shahriar, Ning Wang, Naren Ramakrishnan, Y. Thomas Hou, Wenjing Lou | Published: 2024-06-18 | Updated: 2025-04-14
モデルの頑健性保証
再構成攻撃
敵対的攻撃検出

Detecting Adversarial Spectrum Attacks via Distance to Decision Boundary Statistics

Authors: Wenwei Zhao, Xiaowen Li, Shangqing Zhao, Jie Xu, Yao Liu, Zhuo Lu | Published: 2024-02-14
敵対的サンプル
敵対的スペクトル攻撃検出
敵対的攻撃検出

Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

Authors: Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, Jing Jiang, Min Lin | Published: 2024-02-13 | Updated: 2024-06-03
LLMセキュリティ
プロンプトインジェクション
敵対的攻撃検出

System-level Analysis of Adversarial Attacks and Defenses on Intelligence in O-RAN based Cellular Networks

Authors: Azuka Chiejina, Brian Kim, Kaushik Chowhdury, Vijay K. Shah | Published: 2024-02-10 | Updated: 2024-02-13
O-RANセキュリティ
サイバー攻撃
敵対的攻撃検出

Comparing Spectral Bias and Robustness For Two-Layer Neural Networks: SGD vs Adaptive Random Fourier Features

Authors: Aku Kammonen, Lisi Liang, Anamika Pandey, Raúl Tempone | Published: 2024-02-01
ウォーターマーキング
バイアス
敵対的攻撃検出