Defense Mechanism

NeuroBreak: Unveil Internal Jailbreak Mechanisms in Large Language Models

Authors: Chuhan Zhang, Ye Zhang, Bowen Shi, Yuyou Gan, Tianyu Du, Shouling Ji, Dazhan Deng, Yingcai Wu | Published: 2025-09-04

Prompt Injection

神経細胞と安全性

Defense Mechanism

2025.09.04 2025.09.06

Literature Database

AEGIS : Automated Co-Evolutionary Framework for Guarding Prompt Injections Schema

Authors: Ting-Chun Liu, Ching-Yu Hsu, Kuan-Yi Lee, Chi-An Fu, Hung-yi Lee | Published: 2025-08-27 | Updated: 2025-10-09

Indirect Prompt Injection

Multi-Objective Optimization

Defense Mechanism

2025.08.27 2025.10.11

Literature Database

MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols

Authors: Yixuan Yang, Daoyuan Wu, Yufan Chen | Published: 2025-08-17 | Updated: 2025-10-09

Prompt leaking

Large Language Model

Defense Mechanism

2025.08.17 2025.10.11

Literature Database

Privacy and Security Threat for OpenAI GPTs

Authors: Wei Wenying, Zhao Kaifa, Xue Lei, Fan Ming | Published: 2025-06-04

Disabling Safety Mechanisms of LLM

Privacy Issues

Defense Mechanism

2025.06.04 2025.06.06

Literature Database

SuperPure: Efficient Purification of Localized and Distributed Adversarial Patches via Super-Resolution GAN Models

Authors: Hossein Khalili, Seongbin Park, Venkat Bollapragada, Nader Sehatbakhsh | Published: 2025-05-22

Adversarial Learning

Computational Complexity

Defense Mechanism

2025.05.22 2025.05.28

Literature Database

Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval

Authors: Taiye Chen, Zeming Wei, Ang Li, Yisen Wang | Published: 2025-05-21

RAG

Large Language Model

Defense Mechanism

2025.05.21 2025.05.28

Literature Database

Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses

Authors: Xiaoxue Yang, Bozhidar Stevanoski, Matthieu Meeus, Yves-Alexandre de Montjoye | Published: 2025-05-21

Alignment

Prompt Injection

Defense Mechanism

2025.05.21 2025.05.28

Literature Database

Model-agnostic clean-label backdoor mitigation in cybersecurity environments

Authors: Giorgio Severi, Simona Boboila, John Holodnak, Kendra Kratkiewicz, Rauf Izmailov, Michael J. De Lucia, Alina Oprea | Published: 2024-07-11 | Updated: 2025-05-05

Backdoor Detection

Backdoor Attack

Defense Mechanism

2024.07.11 2025.05.27

Literature Database

Large Language Model Sentinel: LLM Agent for Adversarial Purification

Authors: Guang Lin, Toshihisa Tanaka, Qibin Zhao | Published: 2024-05-24 | Updated: 2025-04-23

Prompt validation

Adversarial Text Purification

Defense Mechanism

2024.05.24 2025.05.27

Literature Database

ModSec-AdvLearn: Countering Adversarial SQL Injections with Robust Machine Learning

Authors: Giuseppe Floris, Christian Scano, Biagio Montaruli, Luca Demetrio, Andrea Valenza, Luca Compagna, Davide Ariu, Luca Piras, Davide Balzarotti, Battista Biggio | Published: 2023-08-09 | Updated: 2025-05-21

Relationship between Robustness and Privacy

Adversarial Example Detection

Defense Mechanism

2023.08.09 2025.05.28

Literature Database