Provable Robust Watermarking for AI-Generated Text

Authors: Xuandong Zhao, Prabhanjan Ananth, Lei Li, Yu-Xiang Wang | Published: 2023-06-30 | Updated: 2023-10-13

2023.06.302025.04.03

Authors: Xuandong Zhao, Prabhanjan Ananth, Lei Li, Yu-Xiang Wang
Published: 2023-06-30 | Updated: 2023-10-13

Source: https://arxiv.org/abs/2306.17439

PDF: https://arxiv.org/pdf/2306.17439

AIにより推定されたラベル

生成AI向け電子透かし透かし技術の堅牢性テキストの摂動手法

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

We study the problem of watermarking large language models (LLMs) generated text – one of the most promising approaches for addressing the safety challenges of LLM usage. In this paper, we propose a rigorous theoretical framework to quantify the effectiveness and robustness of LLM watermarks. We propose a robust and high-quality watermark method, Unigram-Watermark, by extending an existing approach with a simplified fixed grouping strategy. We prove that our watermark method enjoys guaranteed generation quality, correctness in watermark detection, and is robust against text editing and paraphrasing. Experiments on three varying LLMs and two datasets verify that our Unigram-Watermark achieves superior detection accuracy and comparable generation quality in perplexity, thus promoting the responsible use of LLMs. Code is available at https://github.com/XuandongZhao/Unigram-Watermark.