敵対的攻撃検出

HoneyTrap: Deceiving Large Language Model Attackers to Honeypot Traps with Resilient Multi-Agent Defense

Authors: Siyuan Li, Xi Lin, Jun Wu, Zehao Liu, Haoyu Li, Tianjie Ju, Xiang Chen, Jianhua Li | Published: 2026-01-07

プロンプトインジェクション

大規模言語モデル

敵対的攻撃検出

2026.01.07

文献データベース

Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense

Authors: Zejian Chen, Chaozhuo Li, Chao Li, Xi Zhang, Litian Zhang, Yiming He | Published: 2026-01-07

プロンプトインジェクション

大規模言語モデル

敵対的攻撃検出

2026.01.07

文献データベース

JPU: Bridging Jailbreak Defense and Unlearning via On-Policy Path Rectification

Authors: Xi Wang, Songlei Jian, Shasha Li, Xiaopeng Li, Zhaoye Li, Bin Ji, Baosheng Wang, Jie Yu | Published: 2026-01-06

プロンプトインジェクション

モデル抽出攻撃

敵対的攻撃検出

2026.01.06

文献データベース

Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking

Authors: Yifan Huang, Xiaojun Jia, Wenbo Guo, Yuqiang Sun, Yihao Huang, Chong Wang, Yang Liu | Published: 2025-12-24

データ選択戦略

プロンプトインジェクション

敵対的攻撃検出

2025.12.24

文献データベース

Unsourced Adversarial CAPTCHA: A Bi-Phase Adversarial CAPTCHA Framework

Authors: Xia Du, Xiaoyuan Liu, Jizhe Zhou, Zheng Lin, Chi-man Pun, Zhe Chen, Wei Ni, Jun Luo | Published: 2025-06-12

モデルの頑健性保証

敵対的学習

敵対的攻撃検出

2025.06.12

文献データベース

Let the Noise Speak: Harnessing Noise for a Unified Defense Against Adversarial and Backdoor Attacks

Authors: Md Hasan Shahriar, Ning Wang, Naren Ramakrishnan, Y. Thomas Hou, Wenjing Lou | Published: 2024-06-18 | Updated: 2025-04-14

モデルの頑健性保証

再構成攻撃

敵対的攻撃検出

2024.06.18

文献データベース

Detecting Adversarial Spectrum Attacks via Distance to Decision Boundary Statistics

Authors: Wenwei Zhao, Xiaowen Li, Shangqing Zhao, Jie Xu, Yao Liu, Zhuo Lu | Published: 2024-02-14

敵対的サンプル

敵対的スペクトル攻撃検出

敵対的攻撃検出

2024.02.14 2025.04.03

文献データベース

Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

Authors: Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, Jing Jiang, Min Lin | Published: 2024-02-13 | Updated: 2024-06-03

LLMセキュリティ

プロンプトインジェクション

敵対的攻撃検出

2024.02.13 2025.04.03

文献データベース

System-level Analysis of Adversarial Attacks and Defenses on Intelligence in O-RAN based Cellular Networks

Authors: Azuka Chiejina, Brian Kim, Kaushik Chowhdury, Vijay K. Shah | Published: 2024-02-10 | Updated: 2024-02-13

O-RANセキュリティ

サイバー攻撃

敵対的攻撃検出

2024.02.10 2025.04.03

文献データベース

Comparing Spectral Bias and Robustness For Two-Layer Neural Networks: SGD vs Adaptive Random Fourier Features

Authors: Aku Kammonen, Lisi Liang, Anamika Pandey, Raúl Tempone | Published: 2024-02-01

ウォーターマーキング

バイアス

敵対的攻撃検出

2024.02.01 2025.04.03

文献データベース