AIセキュリティポータル K Program
Functional Subspace Watermarking for Large Language Models
Share
Abstract
Model watermarking utilizes internal representations to protect the ownership of large language models (LLMs). However, these features inevitably undergo complex distortions during realistic model modifications such as fine-tuning, quantization, or knowledge distillation, making reliable extraction extremely challenging. Despite extensive research on model-side watermarking, existing methods still lack sufficient robustness against parameter-level perturbations. To address this gap, we propose \texttt{\textbf{Functional Subspace Watermarking (FSW)}}, a framework that anchors ownership signals into a low-dimensional functional backbone. Specifically, we first solve a generalized eigenvalue problem to extract a stable functional subspace for watermark injection, while introducing an adaptive spectral truncation strategy to achieve an optimal balance between robustness and model utility. Furthermore, a vector consistency constraint is incorporated to ensure that watermark injection does not compromise the original semantic performance. Extensive experiments across various LLM architectures and datasets demonstrate that our method achieves superior detection accuracy and statistical verifiability under multiple model attacks, maintaining robustness that outperforms existing state-of-the-art (SOTA) methods.
Turning your weakness into a strength: Watermarking deep neural networks by backdooring
Y. Adi, C. Baum, M. Cisse, B. Pinkas, J. Keshet
Published: 2018
DeepSigns: an end-to-end watermarking framework for ownership protection of deep neural networks
Bita Darvish Rouhani, Huili Chen, Farinaz Koushanfar
Published: 2019
Scalable watermarking for identifying large language model outputs
S. Dathathri, A. See, S. Ghaisas, P.-S. Huang, R. McAdam, J. Welbl, V. Bachani, A. Kaskasoli, R. Stanforth, T. Matejovicova, et al.
Published: 2024
Theoretically grounded framework for llm watermarking: A distribution-adaptive approach
H. He, Y. Liu, Z. Wang, Y. Mao, Y. Bu
Published: 2024
A watermark for large language models
J. Kirchenbauer, J. Geiping, Y. Wen, J. Katz, I. Miers, T. Goldstein
Published: 2023
Watermarking llms with weight quantization
L. Li, B. Jiang, P. Wang, K. Ren, H. Yan, X. Qiu
Published: 2023
Differentiation-based extraction of proprietary data from fine-tuned llms
Z. Li, D. Wu, S. Wang, Z. Su
Published: 2025
Evolution of the spectral dimension of transformer activations
A. Z. Liu, E. Paquette, J. Sous
In-context watermarks for large language models
Y. Liu, X. Zhao, C. Kruegel, D. Song, Y. Bu
Published: 2025
Watermarking large language models: An unbiased and low-risk method
M. Mao, D. Wei, Z. Chen, X. Fang, M. Chau
Published: 2025
Ensemble watermarks for large language models
G. Niess, R. Kern
Published: 2025
{LLMmap}: Fingerprinting for large language models
D. Pasquini, E. M. Kornaropoulos, G. Ateniese
Published: 2025
The evolution of llama: From llama 1 to llama 3.1
L. Roque
Published: 2024
Watermarking makes language models radioactive
T. Sander, P. Fernandez, A. Durmus, M. Douze, T. Furon
Published: 2024
SoK: Large Language Model Copyright Auditing via Fingerprinting
Shuo Shao, Yiming Li, Yu He, Hongwei Yao, Wenyuan Yang, Dacheng Tao, Zhan Qin
Published: 2025.8.27
Riga: Covert and robust white-box watermarking of deep neural networks
T. Wang, F. Kerschbaum
Published: 2021
Robust multi-bit text watermark with llm-based paraphrasers
X. Xu, J. Jia, Y. Yao, Y. Liu, H. Li
Published: 2025
Mergeprint: Merge-resistant fingerprints for robust black-box ownership verification of large language models
S. Yamabe, F. K. Waseda, T. Takahashi, K. Wataoka
Published: 2025
Rethinking {White-Box} watermarks on deep learning models under neural structural obfuscation
Y. Yan, X. Pan, M. Zhang, M. Yang
Published: 2023
PromptCARE: Prompt Copyright Protection by Watermark Injection and Verification
Hongwei Yao, Jian Lou, Kui Ren, Zhan Qin
Published: 2023.8.5
HellaSwag: Can a machine really finish your sentence?
R. Zellers, A. Holtzman, Y. Bisk, A. Farhadi, Y. Choi
Published: 2019
Emmark: Robust watermarks for ip protection of embedded quantized large language models
R. Zhang, F. Koushanfar
Published: 2024
REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models
Ruisi Zhang, Shehzeen Samarah Hussain, Paarth Neekhara, Farinaz Koushanfar
Published: 2023.10.19
Multimodal c4: An open, billion-scale corpus of images interleaved with text
W. Zhu, J. Hessel, A. Awadalla, S. Y. Gadre, J. Dodge, A. Fang, Y. Yu, L. Schmidt, W. Y. Wang, Y. Choi
Published: 2023
Share