The rapid spread of text generated by large language models (LLMs) makes it
increasingly difficult to distinguish authentic human writing from machine
output. Watermarking offers a promising solution: model owners can embed an
imperceptible signal into generated text, marking its origin. Most leading
approaches seed an LLM's next-token sampling with a pseudo-random key that can
later be recovered to identify the text as machine-generated, while only
minimally altering the model's output distribution. However, these methods
suffer from two related issues: (i) watermarks are brittle to simple
surface-level edits such as paraphrasing or reordering; and (ii) adversaries
can append unrelated, potentially harmful text that inherits the watermark,
risking reputational damage to model owners. To address these issues, we
introduce SimKey, a semantic key module that strengthens watermark robustness
by tying key generation to the meaning of prior context. SimKey uses
locality-sensitive hashing over semantic embeddings to ensure that paraphrased
text yields the same watermark key, while unrelated or semantically shifted
text produces a different one. Integrated with state-of-the-art watermarking
schemes, SimKey improves watermark robustness to paraphrasing and translation
while preventing harmful content from false attribution, establishing
semantic-aware keying as a practical and extensible watermarking direction.