Literature Database

Bypassing Prompt Injection and Jailbreak Detection in LLM Guardrails

Authors: William Hackett, Lewis Birch, Stefan Trawicki, Neeraj Suri, Peter Garraghan | Published: 2025-04-15 | Updated: 2025-04-16
LLM Performance Evaluation
Prompt Injection
Adversarial Attack Analysis

CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation

Authors: Jirui Yang, Zheyu Lin, Zhihui Lu, Yinggui Wang, Lei Wang, Tao Wei, Xin Du, Shuhan Yang | Published: 2025-04-15 | Updated: 2025-07-31
Prompt Injection
Robustness of Watermarking Techniques
Defense Effectiveness Analysis

Can LLMs Handle WebShell Detection? Overcoming Detection Challenges with Behavioral Function-Aware Framework

Authors: Feijiang Han, Jiaming Zhang, Chuyi Deng, Jianheng Tang, Yunhuai Liu | Published: 2025-04-14 | Updated: 2025-08-26
Data Generation Method
Program Analysis
Prompt leaking

Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design

Authors: Andreas Happe, Jürgen Cito | Published: 2025-04-14
Testbed
Prompt validation
Progress Tracking

Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?

Authors: Yanbo Wang, Jiyang Guan, Jian Liang, Ran He | Published: 2025-04-14
Prompt Injection
Bias in Training Data
Safety Alignment

StruPhantom: Evolutionary Injection Attacks on Black-Box Tabular Agents Powered by Large Language Models

Authors: Yang Feng, Xudong Pan | Published: 2025-04-14
LLM Performance Evaluation
Indirect Prompt Injection
Malicious Website Detection

An Investigation of Large Language Models and Their Vulnerabilities in Spam Detection

Authors: Qiyao Tang, Xiangyang Li | Published: 2025-04-14
LLM Performance Evaluation
Prompt Injection
Model DoS

ControlNET: A Firewall for RAG-based LLM System

Authors: Hongwei Yao, Haoran Shi, Yidou Chen, Yixin Jiang, Cong Wang, Zhan Qin | Published: 2025-04-13 | Updated: 2025-04-17
Poisoning attack on RAG
Indirect Prompt Injection
Data Breach Risk

CheatAgent: Attacking LLM-Empowered Recommender Systems via LLM Agent

Authors: Liang-bo Ning, Shijie Wang, Wenqi Fan, Qing Li, Xin Xu, Hao Chen, Feiran Huang | Published: 2025-04-13 | Updated: 2025-04-24
Indirect Prompt Injection
Prompt Injection
Attacker Behavior Analysis

Detecting Instruction Fine-tuning Attacks on Language Models using Influence Function

Authors: Jiawei Li | Published: 2025-04-12 | Updated: 2025-09-30
Backdoor Attack
Prompt validation
Sentiment Analysis