LLM Security

System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective

Authors: Fangzhou Wu, Ethan Cecchetti, Chaowei Xiao | Published: 2024-09-27 | Updated: 2024-10-10
LLM Security
Prompt Injection
Execution Trace Interference

Multi-Designated Detector Watermarking for Language Models

Authors: Zhengan Huang, Gongxian Zeng, Xin Mu, Yu Wang, Yue Yu | Published: 2024-09-26 | Updated: 2024-10-01
LLM Security
Watermarking
Watermark Evaluation

Order of Magnitude Speedups for LLM Membership Inference

Authors: Rongting Zhang, Martin Bertran, Aaron Roth | Published: 2024-09-22 | Updated: 2024-09-24
LLM Security
Membership Inference
Low-Cost Membership Inference Method

FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition

Authors: Zhenhua Xu, Wenpeng Xing, Zhebo Wang, Chang Hu, Chen Jie, Meng Han | Published: 2024-09-13
LLM Security
Fingerprinting Method
Model Performance Evaluation

LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems

Authors: Hakan T. Otal, M. Abdullah Canbaz | Published: 2024-09-12 | Updated: 2024-09-15
LLM Security
Cybersecurity
Prompt Injection

Exploring LLMs for Malware Detection: Review, Framework Design, and Countermeasure Approaches

Authors: Jamal Al-Karaki, Muhammad Al-Zafar Khan, Marwan Omar | Published: 2024-09-11
LLM Security
Prompt Injection
Malware Classification

Context-Aware Membership Inference Attacks against Pre-trained Large Language Models

Authors: Hongyan Chang, Ali Shahin Shamsabadi, Kleomenis Katevas, Hamed Haddadi, Reza Shokri | Published: 2024-09-11
LLM Security
Membership Inference
Attack Method

AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs

Authors: Lijia Lv, Weigang Zhang, Xuehai Tang, Jie Wen, Feng Liu, Jizhong Han, Songlin Hu | Published: 2024-09-11
LLM Security
Prompt Injection
Attack Method

Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)

Authors: Alan Aqrawi, Arian Abbasi | Published: 2024-09-04 | Updated: 2024-09-10
LLM Security
Content Moderation
Attack Method

Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack

Authors: Guanzhong Chen, Zhenghan Qin, Mingxin Yang, Yajie Zhou, Tao Fan, Tianyu Du, Zenglin Xu | Published: 2024-09-02 | Updated: 2024-09-04
LLM Security
Prompt Injection
Attack Method