A Watermark for Black-Box Language Models

TOP Literature Database A Watermark for Black-Box Language Models

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2410.02099

PDF

https://arxiv.org/pdf/2410.02099

Paper Information

Author: Dara Bahri;John Wieting;Dana Alon;Donald Metzler
Published: 10-3-2024
Affiliation: Google DeepMind
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Watermarking LLM Performance Evaluation Watermark Evaluation

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Watermarking has recently emerged as an effective strategy for detecting the outputs of large language models (LLMs). Most existing schemes require \emph{white-box} access to the model's next-token probability distribution, which is typically not accessible to downstream users of an LLM API. In this work, we propose a principled watermarking scheme that requires only the ability to sample sequences from the LLM (i.e. \emph{black-box} access), boasts a \emph{distortion-free} property, and can be chained or nested using multiple secret keys. We provide performance guarantees, demonstrate how it can be leveraged when white-box access is available, and show when it can outperform existing white-box schemes via comprehensive experiments.

External Datasets

MISTRAL-7B-INSTRUCT

databricks-dolly-15k

eli5-category