Recent advances in Large Language Models (LLMs) have led to significant
improvements in natural language processing tasks, but their ability to
generate human-quality text raises significant ethical and operational concerns
in settings where it is important to recognize whether or not a given text was
generated by a human. Thus, recent work has focused on developing techniques
for watermarking LLM-generated text, i.e., introducing an almost imperceptible
signal that allows a provider equipped with a secret key to determine if given
text was generated by their model. Current watermarking techniques are often
not practical due to concerns with generation latency, detection time,
degradation in text quality, or robustness. Many of these drawbacks come from
the focus on token-level watermarking, which ignores the inherent structure of
text. In this work, we introduce a new scheme, GaussMark, that is simple and
efficient to implement, has formal statistical guarantees on its efficacy,
comes at no cost in generation latency, and embeds the watermark into the
weights of the model itself, providing a structural watermark. Our approach is
based on Gaussian independence testing and is motivated by recent empirical
observations that minor additive corruptions to LLM weights can result in
models of identical (or even improved) quality. We show that by adding a small
amount of Gaussian noise to the weights of a given LLM, we can watermark the
model in a way that is statistically detectable by a provider who retains the
secret key. We provide formal statistical bounds on the validity and power of
our procedure. Through an extensive suite of experiments, we demonstrate that
GaussMark is reliable, efficient, and relatively robust to corruptions such as
insertions, deletions, substitutions, and roundtrip translations and can be
instantiated with essentially no loss in model quality.