Model Extraction Attack

Gandalf the Red: Adaptive Security for LLMs

Authors: Niklas Pfister, Václav Volhejn, Manuel Knott, Santiago Arias, Julia Bazińska, Mykhailo Bichurin, Alan Commike, Janet Darling, Peter Dienes, Matthew Fiedler, David Haber, Matthias Kraft, Marco Lancini, Max Mathys, Damián Pascual-Ortiz, Jakub Podolak, Adrià Romero-López, Kyriacos Shiarlis, Andreas Signer, Zsolt Terek, Athanasios Theocharis, Daniel Timbrell, Samuel Trautwein, Samuel Watts, Yun-Han Wu, Mateo Rojas-Carulla | Published: 2025-01-14 | Updated: 2025-08-04
Prompt Injection
Model Extraction Attack
User Behavior Analysis

HoneypotNet: Backdoor Attacks Against Model Extraction

Authors: Yixu Wang, Tianle Gu, Yan Teng, Yingchun Wang, Xingjun Ma | Published: 2025-01-02
Backdoor Attack
Model Extraction Attack

Confidential Prompting: Privacy-preserving LLM Inference on Cloud

Authors: Caihua Li, In Gim, Lin Zhong | Published: 2024-09-27 | Updated: 2025-08-25
Process Partitioning Method
Prompt leaking
Model Extraction Attack

Hard-Label Cryptanalytic Extraction of Neural Network Models

Authors: Yi Chen, Xiaoyang Dong, Jian Guo, Yantian Shen, Anyu Wang, Xiaoyun Wang | Published: 2024-09-18
Model Extraction Attack
Attack Method
Computational Complexity

CaBaGe: Data-Free Model Extraction using ClAss BAlanced Generator Ensemble

Authors: Jonathan Rosenthal, Shanchao Liang, Kevin Zhang, Lin Tan | Published: 2024-09-16
Dataset Generation
Training Data Extraction Method
Model Extraction Attack

“Yes, My LoRD.” Guiding Language Model Extraction with Locality Reinforced Distillation

Authors: Zi Liang, Qingqing Ye, Yanyun Wang, Sen Zhang, Yaxin Xiao, Ronghua Li, Jianliang Xu, Haibo Hu | Published: 2024-09-04 | Updated: 2025-05-19
LLM Security
Model Extraction Attack
Watermarking Technology

Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory

Authors: Pasan Dissanayake, Sanghamitra Dutta | Published: 2024-05-08 | Updated: 2024-11-05
Model Performance Evaluation
Model Extraction Attack
Watermark Evaluation

QuantumLeak: Stealing Quantum Neural Networks from Cloud-based NISQ Machines

Authors: Zhenxiao Fu, Min Yang, Cheng Chu, Yilun Xu, Gang Huang, Fan Chen | Published: 2024-03-16
Watermarking
Model Extraction Attack
Quantum Framework

Stealing Part of a Production Language Model

Authors: Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy, Itay Yona, Eric Wallace, David Rolnick, Florian Tramèr | Published: 2024-03-11 | Updated: 2024-07-09
Prompt leaking
Model Robustness
Model Extraction Attack

Watermark Stealing in Large Language Models

Authors: Nikola Jovanović, Robin Staab, Martin Vechev | Published: 2024-02-29 | Updated: 2024-06-24
Model Extraction Attack
Large Language Model
Taxonomy of Attacks