バックドア攻撃

Finetuning-Activated Backdoors in LLMs

Authors: Thibaud Gloaguen, Mark Vero, Robin Staab, Martin Vechev | Published: 2025-05-22
LLMセキュリティ
バックドア攻撃
プロンプトインジェクション

Analysis of the vulnerability of machine learning regression models to adversarial attacks using data from 5G wireless networks

Authors: Leonid Legashev, Artur Zhigalov, Denis Parfenov | Published: 2025-05-01
バックドア攻撃
ポイズニング
攻撃タイプ

How to Backdoor the Knowledge Distillation

Authors: Chen Wu, Qian Ma, Prasenjit Mitra, Sencun Zhu | Published: 2025-04-30
バックドア攻撃
敵対的学習
知識蒸留の脆弱性

Detecting Instruction Fine-tuning Attacks on Language Models using Influence Function

Authors: Jiawei Li | Published: 2025-04-12 | Updated: 2025-09-30
バックドア攻撃
プロンプトの検証
感情分析

BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models

Authors: Zenghui Yuan, Jiawen Shi, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun | Published: 2025-03-20
バックドア攻撃
プロンプトインジェクション
大規模言語モデル

Trust Under Siege: Label Spoofing Attacks against Machine Learning for Android Malware Detection

Authors: Tianwei Lan, Luca Demetrio, Farid Nait-Abdesselam, Yufei Han, Simone Aonzo | Published: 2025-03-14
バックドア攻撃
ラベル
攻撃手法

ToxicSQL: Migrating SQL Injection Threats into Text-to-SQL Models via Backdoor Attack

Authors: Meiyu Lin, Haichuan Zhang, Jiale Lao, Renyuan Li, Yuanchun Zhou, Carl Yang, Yang Cao, Mingjie Tang | Published: 2025-03-07 | Updated: 2025-04-03
バックドアモデルの検知
バックドア攻撃
モデル性能評価

Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks

Authors: Hanjiang Hu, Alexander Robey, Changliu Liu | Published: 2025-02-28 | Updated: 2025-08-25
バックドア攻撃
プロンプトインジェクション
透かし

BackdoorDM: A Comprehensive Benchmark for Backdoor Learning on Diffusion Model

Authors: Weilin Lin, Nanjun Zhou, Yanyun Wang, Jianze Li, Hui Xiong, Li Liu | Published: 2025-02-17 | Updated: 2025-07-21
トリガーの検知
バックドア攻撃
性能評価

Provably effective detection of effective data poisoning attacks

Authors: Jonathan Gallagher, Yasaman Esfandiari, Callen MacPhee, Michael Warren | Published: 2025-01-21
バックドア攻撃
ポイズニング
実験的検証