These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Existing watermarking methods for large language models (LLMs) mainly embed
watermark by adjusting the token sampling prediction or post-processing,
lacking intrinsic coupling with LLMs, which may significantly reduce the
semantic quality of the generated marked texts. Traditional watermarking
methods based on training or fine-tuning may be extendable to LLMs. However,
most of them are limited to the white-box scenario, or very time-consuming due
to the massive parameters of LLMs. In this paper, we present a new watermarking
framework for LLMs, where the watermark is embedded into the LLM by
manipulating the internal parameters of the LLM, and can be extracted from the
generated text without accessing the LLM. Comparing with related methods, the
proposed method entangles the watermark with the intrinsic parameters of the
LLM, which better balances the robustness and imperceptibility of the
watermark. Moreover, the proposed method enables us to extract the watermark
under the black-box scenario, which is computationally efficient for use.
Experimental results have also verified the feasibility, superiority and
practicality. This work provides a new perspective different from mainstream
works, which may shed light on future research.