These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Watermarking has recently emerged as an effective strategy for detecting the
outputs of large language models (LLMs). Most existing schemes require
\emph{white-box} access to the model's next-token probability distribution,
which is typically not accessible to downstream users of an LLM API. In this
work, we propose a principled watermarking scheme that requires only the
ability to sample sequences from the LLM (i.e. \emph{black-box} access), boasts
a \emph{distortion-free} property, and can be chained or nested using multiple
secret keys. We provide performance guarantees, demonstrate how it can be
leveraged when white-box access is available, and show when it can outperform
existing white-box schemes via comprehensive experiments.