A Watermark for Black-Box Language Models

TOP 文献データベース A Watermark for Black-Box Language Models

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2410.02099

PDF

https://arxiv.org/pdf/2410.02099

文献情報

作者: Dara Bahri;John Wieting;Dana Alon;Donald Metzler
公開日: 2024-10-3
所属機関: Google DeepMind
所属の国: United States of America
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

ウォーターマーキング LLM性能評価透かし評価

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Watermarking has recently emerged as an effective strategy for detecting the outputs of large language models (LLMs). Most existing schemes require \emph{white-box} access to the model's next-token probability distribution, which is typically not accessible to downstream users of an LLM API. In this work, we propose a principled watermarking scheme that requires only the ability to sample sequences from the LLM (i.e. \emph{black-box} access), boasts a \emph{distortion-free} property, and can be chained or nested using multiple secret keys. We provide performance guarantees, demonstrate how it can be leveraged when white-box access is available, and show when it can outperform existing white-box schemes via comprehensive experiments.

外部データセット

MISTRAL-7B-INSTRUCT

databricks-dolly-15k

eli5-category