These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The rapid progress of large language models (LLMs) has greatly enhanced
reasoning tasks and facilitated the development of LLM-based applications. A
critical factor in improving LLM-based applications is the design of effective
system prompts, which significantly impact the behavior and output quality of
LLMs. However, system prompts are susceptible to theft and misuse, which could
undermine the interests of prompt owners. Existing methods protect prompt
copyrights through watermark injection and verification but face challenges due
to their reliance on intermediate LLM outputs (e.g., logits), which limits
their practical feasibility.
In this paper, we propose PromptCOS, a method for auditing prompt copyright
based on content-level output similarity. It embeds watermarks by optimizing
the prompt while simultaneously co-optimizing a special verification query and
content-level signal marks. This is achieved by leveraging cyclic output
signals and injecting auxiliary tokens to ensure reliable auditing in
content-only scenarios. Additionally, it incorporates cover tokens to protect
the watermark from malicious deletion. For copyright verification, PromptCOS
identifies unauthorized usage by comparing the similarity between the
suspicious output and the signal mark. Experimental results demonstrate that
our method achieves high effectiveness (99.3% average watermark similarity),
strong distinctiveness (60.8% greater than the best baseline), high fidelity
(accuracy degradation of no more than 0.58%), robustness (resilience against
three types of potential attacks), and computational efficiency (up to 98.1%
reduction in computational cost). Our code is available at GitHub
https://github.com/LianPing-cyber/PromptCOS.