AIセキュリティポータル K Program
Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models
Share
Abstract
Large language models generate high-quality responses with potential misinformation, underscoring the need for regulation by distinguishing AI-generated and human-written texts. Watermarking is pivotal in this context, which involves embedding hidden markers in texts during the LLM inference phase, which is imperceptible to humans. Achieving both the detectability of inserted watermarks and the semantic quality of generated texts is challenging. While current watermarking algorithms have made promising progress in this direction, there remains significant scope for improvement. To address these challenges, we introduce a novel multi-objective optimization (MOO) approach for watermarking that utilizes lightweight networks to generate token-specific watermarking logits and splitting ratios. By leveraging MOO to optimize for both detection and semantic objective functions, our method simultaneously achieves detectability and semantic integrity. Experimental results show that our method outperforms current watermarking techniques in enhancing the detectability of texts generated by LLMs while maintaining their semantic coherence. Our code is available at https://github.com/mignonjia/TS_watermark.
Adversarial watermarking transformer: Towards tracing text provenance with data hiding
Sahar Abdelnabi, Mario Fritz
Published: 2021
Generative ai and the future of elections
R. M. Alvarez, F. Eberhardt, M. Linegar
Published: 2023
An estimate of an upper bound for the entropy of english
P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, J. C. Lai, R. L. Mercer
Published: 1992
Language models are few-shot learners
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei
Published: 2020
X-mark: Towards lossless watermarking through lexical redundancy
L. Chen, Y. Bian, Y. Deng, S. Li, B. Wu, P. Zhao, K. fai Wong
Published: 2023
Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks
Z. Chen, V. Badrinarayanan, C.-Y. Lee, A. Rabinovich
Published: 2018
Lyapunov central limit theorem: Theoretical properties and applications in big-data-populated smart city settings
A. Cuzzocrea, E. Fadda, A. Baldo
Published: 2021
Multi-objective optimization
K. Deb, K. Sindhya, J. Hakanen
Published: 2016
Multiple-gradient descent algorithm (mgda) for multiobjective optimization
J.-A. Desid ´ eri
Published: 2012
Three bricks to consolidate watermarks for large language models
P. Fernandez, A. Chaffin, K. Tit, V. Chappelier, T. Furon
Published: 2023
Feature-based detection of automated language models: tackling gpt-2, gpt-3 and grover
L. Frohling, A. Zubiaga
Published: 2021
On pushing deepfake tweet detection capabilities to the limits
M. Gambini, T. Fagni, F. Falchi, M. Tesconi
Published: 2022
SimCSE: Simple contrastive learning of sentence embeddings
Tianyu Gao, Xingcheng Yao, Danqi Chen
Published: 2021
Pareto front estimation for decision making
I. Giagkiozis, P. J. Fleming
Published: 2014
The five-parameter logistic: a characterization and comparison with the four-parameter logistic
P. G. Gottschalk, J. R. Dunn
Published: 2005
Dimensionality reduction by learning an invariant mapping
R. Hadsell, S. Chopra, Y. LeCun
Published: 2006
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification
K. He, X. Zhang, S. Ren, J. Sun
Published: 2015
Protecting intellectual property of language generation apis with lexical watermark
X. He, Q. Xu, L. Lyu, F. Wu, C. Wang
Published: 2022
Cater: Intellectual property protection on text generation apis via conditional watermarks
X. He, Q. Xu, Y. Zeng, L. Lyu, F. Wu, J. Li, R. Jia
Published: 2022
A watermark for large language models
J. Kirchenbauer, J. Geiping, Y. Wen, J. Katz, I. Miers, T. Goldstein
Published: 2023
Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense
Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, Mohit Iyyer
Published: 2023.3.24
Robust Distortion-free Watermarks for Language Models
Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang
Published: 2023.7.28
A Semantic Invariant Robust Watermark for Large Language Models
Aiwei Liu, Leyi Pan, Xuming Hu, Shiao Meng, Lijie Wen
Published: 2023.10.10
Rectifier nonlinearities improve neural network acoustic models
Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng
Published: 2013
Mark My Words: Analyzing and Evaluating Language Model Watermarks
Julien Piet, Chawin Sitawarin, Vivian Fang, Norman Mu, David Wagner
Published: 2023.12.1
Exploring the limits of transfer learning with a unified text-to-text transformer
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J Liu
Published: 2020
Cross-domain detection of gpt-2-generated technical text
J. Rodriguez, T. Hay, D. Gros, Z. Shamsi, R. Srinivasan
Published: 2022
Chatgpt: Optimizing language models for dialogue
J. Schulman, B. Zoph, C. Kim, J. Hilton, J. Menick, J. Weng, J. F. C. Uribe, L. Fedus, L. Metz, M. Pokorny, et al.
Published: 2022
Attention is all you need
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin
Published: 2017
LLMDet: A third party large language models generated text detection tool
Kangxi Wu, Liang Pang, Huawei Shen, Xueqi Cheng, Tat-Seng Chua
Published: 2023
Robust multi-bit natural language watermarking through invariant features
K. Yoo, W. Ahn, J. Jang, N. Kwak
Published: 2023
REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models
Ruisi Zhang, Shehzeen Samarah Hussain, Paarth Neekhara, Farinaz Koushanfar
Published: 2023.10.19
Share