Jailbreak Distillation: Renewable Safety Benchmarking Authors: Jingyu Zhang, Ahmed Elgohary, Xiawei Wang, A S M Iftekhar, Ahmed Magooda, Benjamin Van Durme, Daniel Khashabi, Kyle Jackson | Published: 2025-05-28 Prompt InjectionModel EvaluationAttack Evaluation 2025.05.28 2025.05.30 Literature Database
Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space Authors: Yao Huang, Yitong Sun, Shouwei Ruan, Yichi Zhang, Yinpeng Dong, Xingxing Wei | Published: 2025-05-27 Disabling Safety Mechanisms of LLMPrompt InjectionAttack Evaluation 2025.05.27 2025.05.29 Literature Database
Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling Authors: Yichuan Cao, Yibo Miao, Xiao-Shan Gao, Yinpeng Dong | Published: 2025-05-27 Model EvaluationExperimental ValidationAttack Evaluation 2025.05.27 2025.05.29 Literature Database
Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs Authors: Shiyu Xiang, Ansen Zhang, Yanfei Cao, Yang Fan, Ronghao Chen | Published: 2025-02-26 | Updated: 2025-05-28 LLM SecurityPrompt InjectionAttack Evaluation 2025.02.26 2025.05.30 Literature Database
Computing Optimization-Based Prompt Injections Against Closed-Weights Models By Misusing a Fine-Tuning API Authors: Andrey Labunets, Nishit V. Pandya, Ashish Hooda, Xiaohan Fu, Earlence Fernandes | Published: 2025-01-16 Prompt InjectionAttack EvaluationOptimization Problem 2025.01.16 2025.05.27 Literature Database
Neural Honeytrace: A Robust Plug-and-Play Watermarking Framework against Model Extraction Attacks Authors: Yixiao Xu, Binxing Fang, Rui Wang, Yinghai Zhou, Shouling Ji, Yuan Liu, Mohan Li, Zhihong Tian | Published: 2025-01-16 | Updated: 2025-01-17 WatermarkingModel Extraction AttackAttack Evaluation 2025.01.16 2025.05.27 Literature Database
Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards Authors: Yangsibo Huang, Milad Nasr, Anastasios Angelopoulos, Nicholas Carlini, Wei-Lin Chiang, Christopher A. Choquette-Choo, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Ken Ziyu Liu, Ion Stoica, Florian Tramer, Chiyuan Zhang | Published: 2025-01-13 CybersecurityLarge Language ModelAttack Evaluation 2025.01.13 2025.05.27 Literature Database
Learning-based Detection of GPS Spoofing Attack for Quadrotors Authors: Pengyu Wang, Zhaohua Yang, Jialu Li, Ling Shi | Published: 2025-01-10 CybersecurityExperimental ValidationAttack Evaluation 2025.01.10 2025.05.27 Literature Database
Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs Authors: Linhao Huang, Xue Jiang, Zhiqiang Wang, Wentao Mo, Xi Xiao, Bo Han, Yongjie Yin, Feng Zheng | Published: 2025-01-02 | Updated: 2025-01-10 Attack EvaluationAttack MethodAdversarial Example 2025.01.02 2025.05.27 Literature Database
FEDLAD: Federated Evaluation of Deep Leakage Attacks and Defenses Authors: Isaac Baglin, Xiatian Zhu, Simon Hadfield | Published: 2024-11-05 | Updated: 2025-01-05 PoisoningAttack EvaluationEvaluation Method 2024.11.05 2025.05.27 Literature Database