PromptCOS: Towards System Prompt Copyright Auditing for LLMs via Content-level Output Similarity Authors: Yuchen Yang, Yiming Li, Hongwei Yao, Enhao Huang, Shuo Shao, Bingrun Yang, Zhibo Wang, Dacheng Tao, Zhan Qin | Published: 2025-09-03 Prompt validationPrompt leakingModel Extraction Attack 2025.09.03 2025.09.05 Literature Database
EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint Authors: Zhenhua Xu, Meng Han, Wenpeng Xing | Published: 2025-09-03 Disabling Safety Mechanisms of LLMData Protection MethodPrompt validation 2025.09.03 2025.09.05 Literature Database
PromptSleuth: Detecting Prompt Injection via Semantic Intent Invariance Authors: Mengxiao Wang, Yuxuan Zhang, Guofei Gu | Published: 2025-08-28 Indirect Prompt InjectionPrompt InjectionPrompt validation 2025.08.28 2025.09.01 Literature Database
Attacking interpretable NLP systems Authors: Eldor Abdukhamidov, Tamer Abuhmed, Joanna C. S. Santos, Mohammed Abuhamad | Published: 2025-07-22 Prompt InjectionPrompt validationAdversarial Attack Methods 2025.07.22 2025.07.24 Literature Database
AICrypto: A Comprehensive Benchmark for Evaluating Cryptography Capabilities of Large Language Models Authors: Yu Wang, Yijian Liu, Liheng Ji, Han Luo, Wenjie Li, Xiaofei Zhou, Chiyun Feng, Puji Wang, Yuhan Cao, Geyuan Zhang, Xiaojian Li, Rongwu Xu, Yilei Chen, Tianxing He | Published: 2025-07-13 | Updated: 2025-09-30 AlgorithmHallucinationPrompt validation 2025.07.13 2025.10.02 Literature Database
GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing Authors: Peiyan Zhang, Haibo Jin, Liying Kang, Haohan Wang | Published: 2025-07-10 Prompt validationLarge Language ModelPerformance Evaluation Metrics 2025.07.10 2025.07.12 Literature Database
PenTest2.0: Towards Autonomous Privilege Escalation Using GenAI Authors: Haitham S. Al-Sinani, Chris J. Mitchell | Published: 2025-07-09 Indirect Prompt InjectionPrompt validationPrompt leaking 2025.07.09 2025.07.11 Literature Database
A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures Authors: Dezhang Kong, Shi Lin, Zhenhua Xu, Zhebo Wang, Minghao Li, Yufeng Li, Yilun Zhang, Zeyang Sha, Yuyuan Li, Changting Lin, Xun Wang, Xuan Liu, Muhammad Khurram Khan, Ningyu Zhang, Chaochao Chen, Meng Han | Published: 2025-06-24 AIエージェント通信Poisoning attack on RAGPrompt validation 2025.06.24 2025.06.26 Literature Database
Adversarial Suffix Filtering: a Defense Pipeline for LLMs Authors: David Khachaturov, Robert Mullins | Published: 2025-05-14 Prompt validation倫理基準遵守Attack Detection Method 2025.05.14 2025.05.28 Literature Database
Defending against Indirect Prompt Injection by Instruction Detection Authors: Tongyu Wen, Chenglong Wang, Xiyuan Yang, Haoyu Tang, Yueqi Xie, Lingjuan Lyu, Zhicheng Dou, Fangzhao Wu | Published: 2025-05-08 | Updated: 2025-09-17 Prompt validationEvaluation MethodWatermarking Technology 2025.05.08 2025.09.19 Literature Database