Large Language Models as Carriers of Hidden Messages

TOP Literature Database Large Language Models as Carriers of Hidden Messages

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2406.02481

PDF

https://arxiv.org/pdf/2406.02481

Paper Information

Author: Jakub Hoscilowicz,Pawel Popiolek,Jan Rudkowski,Jedrzej Bieniasz,Artur Janicki
Published: 6-5-2024
Updated: 5-5-2025
Affiliation: Institute of Telecommunications, Warsaw University of Technology
Country: Poland
Conference: International Conference on Security and Cryptography (SECRYPT)

Labels Estimated by AI

Watermark Design Fingerprinting Method Algorithm

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Simple fine-tuning can embed hidden text into large language models (LLMs), which is revealed only when triggered by a specific query. Applications include LLM fingerprinting, where a unique identifier is embedded to verify licensing compliance, and steganography, where the LLM carries hidden messages disclosed through a trigger query. Our work demonstrates that embedding hidden text via fine-tuning, although seemingly secure due to the vast number of potential triggers, is vulnerable to extraction through analysis of the LLM's output decoding process. We introduce an extraction attack called Unconditional Token Forcing (UTF), which iteratively feeds tokens from the LLM's vocabulary to reveal sequences with high token probabilities, indicating hidden text candidates. We also present Unconditional Token Forcing Confusion (UTFC), a defense paradigm that makes hidden text resistant to all known extraction attacks without degrading the general performance of LLMs compared to standard fine-tuning. UTFC has both benign (improving LLM fingerprinting) and malign applications (using LLMs to create covert communication channels).

External Datasets

instruction-formatted fingerprint pairs

fingerprinted LLM1

five fingerprinted LLMs provided by Xu et al. (2024)