A Watermark for Black-Box Language Models

Authors: Dara Bahri, John Wieting, Dana Alon, Donald Metzler | Published: 2024-10-02

2024.10.022025.04.03

Authors: Dara Bahri, John Wieting, Dana Alon, Donald Metzler
Published: 2024-10-02

Source: https://arxiv.org/abs/2410.02099

PDF: https://arxiv.org/pdf/2410.02099

AIにより推定されたラベル

ウォーターマーキング LLM性能評価透かし評価

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Watermarking has recently emerged as an effective strategy for detecting the outputs of large language models (LLMs). Most existing schemes require white-box access to the model’s next-token probability distribution, which is typically not accessible to downstream users of an LLM API. In this work, we propose a principled watermarking scheme that requires only the ability to sample sequences from the LLM (i.e. black-box access), boasts a distortion-free property, and can be chained or nested using multiple secret keys. We provide performance guarantees, demonstrate how it can be leveraged when white-box access is available, and show when it can outperform existing white-box schemes via comprehensive experiments.