Prompt leaking

R1dacted: Investigating Local Censorship in DeepSeek’s R1 Language Model

Authors: Ali Naseh, Harsh Chaudhari, Jaechul Roh, Mingshi Wu, Alina Oprea, Amir Houmansadr | Published: 2025-05-19
Bias Detection in AI Output
Prompt leaking
検閲行動

Cutting Through Privacy: A Hyperplane-Based Data Reconstruction Attack in Federated Learning

Authors: Francesco Diana, André Nusser, Chuan Xu, Giovanni Neglia | Published: 2025-05-15
Prompt leaking
Model Extraction Attack
Exploratory Attack

Instantiating Standards: Enabling Standard-Driven Text TTP Extraction with Evolvable Memory

Authors: Cheng Meng, ZhengWei Jiang, QiuYun Wang, XinYi Li, ChunYan Ma, FangMing Dong, FangLi Ren, BaoXu Liu | Published: 2025-05-14
Prompt leaking
Attack Detection Method
Knowledge Extraction Method

SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models

Authors: Huining Cui, Wei Liu | Published: 2025-05-12
LLM Security
Prompt Injection
Prompt leaking

I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference

Authors: Zibo Gao, Junjie Hu, Feng Guo, Yixin Zhang, Yinglong Han, Siyuan Liu, Haiyang Li, Zhiqiang Lv | Published: 2025-05-10 | Updated: 2025-05-14
Disabling Safety Mechanisms of LLM
Prompt leaking
Attack Detection Method

LLM-Text Watermarking based on Lagrange Interpolation

Authors: Jarosław Janas, Paweł Morawiecki, Josef Pieprzyk | Published: 2025-05-09 | Updated: 2025-05-13
LLM Security
Prompt leaking
Digital Watermarking for Generative AI

Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks

Authors: Yixin Cheng, Hongcheng Guo, Yangming Li, Leonid Sigal | Published: 2025-05-08
Prompt leaking
Attack Method
Watermarking Technology

Towards Effective Identification of Attack Techniques in Cyber Threat Intelligence Reports using Large Language Models

Authors: Hoang Cuong Nguyen, Shahroz Tariq, Mohan Baruwal Chhetri, Bao Quoc Vo | Published: 2025-05-06
Prompt leaking
Attack Type
Taxonomy of Attacks

Unveiling the Landscape of LLM Deployment in the Wild: An Empirical Study

Authors: Xinyi Hou, Jiahao Han, Yanjie Zhao, Haoyu Wang | Published: 2025-05-05
API Security
Indirect Prompt Injection
Prompt leaking

An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding

Authors: Xiuwei Shang, Zhenkan Fu, Shaoyin Cheng, Guoqiang Chen, Gangyang Li, Li Hu, Weiming Zhang, Nenghai Yu | Published: 2025-04-30
Program Analysis
Prompt Injection
Prompt leaking