PromptLocate: Localizing Prompt Injection Attacks Authors: Yuqi Jia, Yupei Liu, Zedian Shao, Jinyuan Jia, Neil Gong | Published: 2025-10-14 Prompt validationLarge Language Modelevaluation metrics 2025.10.14 2025.10.16 Literature Database
PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities Authors: Zicheng Liu, Lige Huang, Jie Zhang, Dongrui Liu, Yuan Tian, Jing Shao | Published: 2025-10-13 Security Analysis MethodLarge Language ModelDefense Mechanism 2025.10.13 2025.10.15 Literature Database
Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs Authors: Fatmazohra Rezkellah, Ramzi Dakhmouche | Published: 2025-10-03 | Updated: 2025-10-15 Identification of AI OutputRobustnessLarge Language Model 2025.10.03 2025.10.17 Literature Database
NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks Authors: Javad Rafiei Asl, Sidhant Narula, Mohammad Ghasemigol, Eduardo Blanco, Daniel Takabi | Published: 2025-10-03 | Updated: 2025-10-21 Prompt InjectionLarge Language Model脱獄手法 2025.10.03 2025.10.23 Literature Database
Bypassing Prompt Guards in Production with Controlled-Release Prompting Authors: Jaiden Fairoze, Sanjam Garg, Keewoo Lee, Mingyuan Wang | Published: 2025-10-02 Prompt InjectionLarge Language ModelStructural Attack 2025.10.02 2025.10.04 Literature Database
EvoMail: Self-Evolving Cognitive Agents for Adaptive Spam and Phishing Email Defense Authors: Wei Huang, De-Tian Chu, Lin-Yuan Bai, Wei Kang, Hai-Tao Zhang, Bo Li, Zhi-Mo Han, Jing Ge, Hai-Feng Lin | Published: 2025-09-25 Phishing AttackLarge Language ModelSelf-Evolving Framework 2025.09.25 2025.09.27 Literature Database
LLM-based Vulnerability Discovery through the Lens of Code Metrics Authors: Felix Weissberg, Lukas Pirch, Erik Imgrund, Jonas Möller, Thorsten Eisenhofer, Konrad Rieck | Published: 2025-09-23 コードメトリクス評価Prompt InjectionLarge Language Model 2025.09.23 2025.09.25 Literature Database
LLM Jailbreak Detection for (Almost) Free! Authors: Guorui Chen, Yifan Xia, Xiaojun Jia, Zhijiang Li, Philip Torr, Jindong Gu | Published: 2025-09-18 Large Language ModelEvaluation MethodWatermarking Technology 2025.09.18 2025.09.20 Literature Database
Yet Another Watermark for Large Language Models Authors: Siyuan Bao, Ying Shi, Zhiguang Yang, Hanzhou Wu, Xinpeng Zhang | Published: 2025-09-16 Prompt leakingLarge Language ModelWatermarking Technology 2025.09.16 2025.09.18 Literature Database
NeuroStrike: Neuron-Level Attacks on Aligned LLMs Authors: Lichao Wu, Sasha Behrouzi, Mohamadreza Rostami, Maximilian Thang, Stjepan Picek, Ahmad-Reza Sadeghi | Published: 2025-09-15 Prompt InjectionLarge Language Model安全性メカニズムの分析 2025.09.15 2025.09.17 Literature Database