Functional Subspace Watermarking for Large Language Models

27th USENIX Security Symposium (USENIX Security)

Turning your weakness into a strength: Watermarking deep neural networks by backdooring

Y. Adi, C. Baum, M. Cisse, B. Pinkas, J. Keshet

Published: 2018

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu

Published: 2024

Think you have solved question answering? try arc, the ai2 reasoning challenge

P. Clark, I. Cowhey, O. Etzioni, T. Khot, A. Sabharwal, C. Schoenick, O. Tafjord

Published: 2018

arxiv

被引用数 2

SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models

Amirhossein Dabiriaghdam, Lele Wang

Published: 2025.2.5

The widespread adoption of large language models (LLMs) necessitates reliable methods to detect LLM-generated text. We introduce SimMark, a robust sentence-level watermarking algorithm that makes LLMs' outputs traceable without requiring access to model internals, making it compatible with both open and API-based LLMs. By leveraging the similarity of semantic sentence embeddings combined with rejection sampling to embed detectable statistical patterns imperceptible to humans, and employing a soft counting mechanism, SimMark achieves robustness against paraphrasing attacks. Experimental results demonstrate that SimMark sets a new benchmark for robust watermarking of LLM-generated content, surpassing prior sentence-level watermarking techniques in robustness, sampling efficiency, and applicability across diverse domains, all while maintaining the text quality and fluency.

透かし設計生成AI向け電子透かしロバスト性分析

Seal: Subspace-anchored watermarks for llm ownership

Y. Dai, Z. Li, Z. Ji, S. Wang

Published: 2025

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems

DeepSigns: an end-to-end watermarking framework for ownership protection of deep neural networks

Bita Darvish Rouhani, Huili Chen, Farinaz Koushanfar

Published: 2019

Nature

Scalable watermarking for identifying large language model outputs

S. Dathathri, A. See, S. Ghaisas, P.-S. Huang, R. McAdam, J. Welbl, V. Bachani, A. Kaskasoli, R. Stanforth, T. Matejovicova, et al.

Published: 2024

The 1st Workshop on GenAI Watermarking

Theoretically grounded framework for llm watermarking: A distribution-adaptive approach

H. He, Y. Liu, Z. Wang, Y. Mao, Y. Bu

Published: 2024

Mistral 7b

A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril, T. Wang, T. Lacroix, W. E. Sayed

Published: 2023

International Conference on Machine Learning

A watermark for large language models

J. Kirchenbauer, J. Geiping, Y. Wen, J. Katz, I. Miers, T. Goldstein

Published: 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

Watermarking llms with weight quantization

L. Li, B. Jiang, P. Wang, K. Ren, H. Yan, X. Qiu

Published: 2023

Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security

Differentiation-based extraction of proprietary data from fine-tuned llms

Z. Li, D. Wu, S. Wang, Z. Su

Published: 2025

OPT 2025: Optimization for Machine Learning

Evolution of the spectral dimension of transformer activations

A. Z. Liu, E. Paquette, J. Sous

ICML 2025 Workshop on Reliable and Responsible Foundation Models

In-context watermarks for large language models

Y. Liu, X. Zhao, C. Kruegel, D. Song, Y. Bu

Published: 2025

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Watermarking large language models: An unbiased and low-risk method

M. Mao, D. Wei, Z. Chen, X. Fang, M. Chau

Published: 2025

The llama 4 herd: The beginning of a new era of natively multimodal ai innovation

Meta AI

Published: 2025

Introducing mistral 3

Mistral AI

Published: 2025

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Ensemble watermarks for large language models

G. Niess, R. Kern

Published: 2025

34th USENIX Security Symposium (USENIX Security 25)

{LLMmap}: Fingerprinting for large language models

D. Pasquini, E. M. Kornaropoulos, G. Ateniese

Published: 2025

Medium

The evolution of llama: From llama 1 to llama 3.1

L. Roque

Published: 2024

Advances in Neural Information Processing Systems

Watermarking makes language models radioactive

T. Sander, P. Fernandez, A. Durmus, M. Douze, T. Furon

Published: 2024

arxiv

被引用数 1

SoK: Large Language Model Copyright Auditing via Fingerprinting

Shuo Shao, Yiming Li, Yu He, Hongwei Yao, Wenyuan Yang, Dacheng Tao, Zhan Qin

Published: 2025.8.27

The broad capabilities and substantial resources required to train Large Language Models (LLMs) make them valuable intellectual property, yet they remain vulnerable to copyright infringement, such as unauthorized use and model theft. LLM fingerprinting, a non-intrusive technique that extracts and compares the distinctive features from LLMs to identify infringements, offers a promising solution to copyright auditing. However, its reliability remains uncertain due to the prevalence of diverse model modifications and the lack of standardized evaluation. In this SoK, we present the first comprehensive study of LLM fingerprinting. We introduce a unified framework and formal taxonomy that categorizes existing methods into white-box and black-box approaches, providing a structured overview of the state of the art. We further propose LeaFBench, the first systematic benchmark for evaluating LLM fingerprinting under realistic deployment scenarios. Built upon mainstream foundation models and comprising 149 distinct model instances, LeaFBench integrates 13 representative post-development techniques, spanning both parameter-altering methods (e.g., fine-tuning, quantization) and parameter-independent mechanisms (e.g., system prompts, RAG). Extensive experiments on LeaFBench reveal the strengths and weaknesses of existing methods, thereby outlining future research directions and critical open problems in this emerging field. The code is available at https://github.com/shaoshuo-ss/LeaFBench.

RAG LLMフィンガープリンティングプロンプトエンジニアリング

Gemma 3 technical report

Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière

Published: 2025

Qwen2 Technical Report

Published: 2024

Llama 2: Open foundation and fine-tuned chat models

H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel

Published: 2023

Proceedings of the web conference 2021

Riga: Covert and robust white-box watermarking of deep neural networks

T. Wang, F. Kerschbaum

Published: 2021

arxiv

被引用数 3

NAACL-HLT

Instructional Fingerprinting of Large Language Models

Jiashu Xu, Fei Wang, Mingyu Derek Ma, Pang Wei Koh, Chaowei Xiao, Muhao Chen

Published: 2024.1.21

The exorbitant cost of training Large language models (LLMs) from scratch makes it essential to fingerprint the models to protect intellectual property via ownership authentication and to ensure downstream users and developers comply with their license terms (e.g. restricting commercial use). In this study, we present a pilot study on LLM fingerprinting as a form of very lightweight instruction tuning. Model publisher specifies a confidential private key and implants it as an instruction backdoor that causes the LLM to generate specific text when the key is present. Results on 11 popularly-used LLMs showed that this approach is lightweight and does not affect the normal behavior of the model. It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License. Code is available in https://cnut1648.github.io/Model-Fingerprint/.

フィンガープリンティング手法ウォーターマーキングモデル性能評価

Forty-second International Conference on Machine Learning

Robust multi-bit text watermark with llm-based paraphrasers

X. Xu, J. Jia, Y. Yao, Y. Liu, H. Li

Published: 2025

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Mergeprint: Merge-resistant fingerprints for robust black-box ownership verification of large language models

S. Yamabe, F. K. Waseda, T. Takahashi, K. Wataoka

Published: 2025

32nd USENIX Security Symposium (USENIX Security 23)

Rethinking {White-Box} watermarks on deep learning models under neural structural obfuscation

Y. Yan, X. Pan, M. Zhang, M. Yang

Published: 2023

Qwen3 Technical Report

PromptCARE: Prompt Copyright Protection by Watermark Injection and Verification

Hongwei Yao, Jian Lou, Kui Ren, Zhan Qin

Published: 2023.8.5

Large language models (LLMs) have witnessed a meteoric rise in popularity among the general public users over the past few months, facilitating diverse downstream tasks with human-level accuracy and proficiency. Prompts play an essential role in this success, which efficiently adapt pre-trained LLMs to task-specific applications by simply prepending a sequence of tokens to the query texts. However, designing and selecting an optimal prompt can be both expensive and demanding, leading to the emergence of Prompt-as-a-Service providers who profit by providing well-designed prompts for authorized use. With the growing popularity of prompts and their indispensable role in LLM-based services, there is an urgent need to protect the copyright of prompts against unauthorized use. In this paper, we propose PromptCARE, the first framework for prompt copyright protection through watermark injection and verification. Prompt watermarking presents unique challenges that render existing watermarking techniques developed for model and dataset copyright verification ineffective. PromptCARE overcomes these hurdles by proposing watermark injection and verification schemes tailor-made for prompts and NLP characteristics. Extensive experiments on six well-known benchmark datasets, using three prevalent pre-trained LLMs (BERT, RoBERTa, and Facebook OPT-1.3b), demonstrate the effectiveness, harmlessness, robustness, and stealthiness of PromptCARE.

プロンプトインジェクション透かしの耐久性ソフトプロンプト最適化

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

HellaSwag: Can a machine really finish your sentence?

R. Zellers, A. Holtzman, Y. Bisk, A. Farhadi, Y. Choi

Published: 2019

Proceedings of the 61st ACM/IEEE Design Automation Conference

Emmark: Robust watermarks for ip protection of embedded quantized large language models

R. Zhang, F. Koushanfar

Published: 2024

arxiv

被引用数 1

USENIX Security Symposium

REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models

Ruisi Zhang, Shehzeen Samarah Hussain, Paarth Neekhara, Farinaz Koushanfar

Published: 2023.10.19

We present REMARK-LLM, a novel efficient, and robust watermarking framework designed for texts generated by large language models (LLMs). Synthesizing human-like content using LLMs necessitates vast computational resources and extensive datasets, encapsulating critical intellectual property (IP). However, the generated content is prone to malicious exploitation, including spamming and plagiarism. To address the challenges, REMARK-LLM proposes three new components: (i) a learning-based message encoding module to infuse binary signatures into LLM-generated texts; (ii) a reparameterization module to transform the dense distributions from the message encoding to the sparse distribution of the watermarked textual tokens; (iii) a decoding module dedicated for signature extraction; Furthermore, we introduce an optimized beam search algorithm to guarantee the coherence and consistency of the generated content. REMARK-LLM is rigorously trained to encourage the preservation of semantic integrity in watermarked content, while ensuring effective watermark retrieval. Extensive evaluations on multiple unseen datasets highlight REMARK-LLM proficiency and transferability in inserting 2 times more signature bits into the same texts when compared to prior art, all while maintaining semantic integrity. Furthermore, REMARK-LLM exhibits better resilience against a spectrum of watermark detection and removal attacks.

モデル設計データ生成悪意のあるコンテンツ生成

Advances in Neural Information Processing Systems

Multimodal c4: An open, billion-scale corpus of images interleaved with text

W. Zhu, J. Hessel, A. Awadalla, S. Y. Gadre, J. Dodge, A. Fang, Y. Yu, L. Schmidt, W. Y. Wang, Y. Choi

Published: 2023