A Watermark for Black-Box Language Models

Authors: Dara Bahri, John Wieting, Dana Alon, Donald Metzler | Published: 2024-10-02

2024.10.022025.05.27

Authors: Dara Bahri, John Wieting, Dana Alon, Donald Metzler
Published: 2024-10-02

Source: https://arxiv.org/abs/2410.02099

PDF: https://arxiv.org/pdf/2410.02099

Labels Predicted by AI

Watermarking LLM Performance Evaluation Watermark Evaluation

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Watermarking has recently emerged as an effective strategy for detecting the outputs of large language models (LLMs). Most existing schemes require white-box access to the model’s next-token probability distribution, which is typically not accessible to downstream users of an LLM API. In this work, we propose a principled watermarking scheme that requires only the ability to sample sequences from the LLM (i.e. black-box access), boasts a distortion-free property, and can be chained or nested using multiple secret keys. We provide performance guarantees, demonstrate how it can be leveraged when white-box access is available, and show when it can outperform existing white-box schemes via comprehensive experiments.