Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking

TOP Literature Database Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2605.28632

PDF

https://arxiv.org/pdf/2605.28632

Paper Information

Author: Ziyang You,Huilong He,Xiaoke Yang,Xuxing Lu
Published: 5-28-2026
Affiliation: Fujian Provincial Key Laboratory of Automotive Electronics and Electric Drive, School of Electronic, Electrical and Physics, Fujian University of Technology
Country: China
Conference

Labels Estimated by AI

Watermark Cryptography LLM Security

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Cryptographic watermarking is a leading defense for attributing text generated by large language models (LLMs). Existing schemes, including KGW, Unigram, and DipMark, derive their security guarantees from the assumption that the underlying pseudo-random number generator (PRNG) is trustworthy. This work introduces SeedHijack, the first supply-chain attack on LLM watermarking that is simultaneously (i) blind -- requiring no knowledge of the watermark key, detector, or model logits, (ii) integrity-preserving -- amplifying rather than erasing the watermark signal, and (iii) orthogonal to detection -- the attack-induced bias is statistically independent of all content-side detector statistics, ensuring that amplification and evasion coexist without trade-off. Rather than perturbing generated text, SeedHijack replaces the PRNG at the supply-chain layer, biasing green-list selection without altering output tokens or degrading text quality. Across three watermarking schemes and three open-source LLMs, the attack triggers 0/6 state-of-the-art content-side statistical detectors while inflating the watermark z-score up to 2.42x (system-level defenses such as entropy-source attestation remain orthogonal and complementary). A quantum random number generator (QRNG) countermeasure is shown to fully neutralize the attack while preserving benign watermarking utility. These findings establish PRNG integrity as a first-class security requirement for cryptographic content-provenance systems.