Multi-Designated Detector Watermarking for Language Models

TOP Literature Database Multi-Designated Detector Watermarking for Language Models

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2409.17518

PDF

https://arxiv.org/pdf/2409.17518

Paper Information

Author: Zhengan Huang;Gongxian Zeng;Xin Mu;Yu Wang;Yue Yu
Published: 9-26-2024
Updated: 10-1-2024
Affiliation: Pengcheng Laboratory
Country: China
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Watermarking Watermark Evaluation LLM Security

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

In this paper, we initiate the study of \emph{multi-designated detector watermarking (MDDW)} for large language models (LLMs). This technique allows model providers to generate watermarked outputs from LLMs with two key properties: (i) only specific, possibly multiple, designated detectors can identify the watermarks, and (ii) there is no perceptible degradation in the output quality for ordinary users. We formalize the security definitions for MDDW and present a framework for constructing MDDW for any LLM using multi-designated verifier signatures (MDVS). Recognizing the significant economic value of LLM outputs, we introduce claimability as an optional security feature for MDDW, enabling model providers to assert ownership of LLM outputs within designated-detector settings. To support claimable MDDW, we propose a generic transformation converting any MDVS to a claimable MDVS. Our implementation of the MDDW scheme highlights its advanced functionalities and flexibility over existing methods, with satisfactory performance metrics.

External Datasets

BLOOMZ 3B model

OPT 1.3B model

Gemma 2B model