These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
With the development of large language models, multiple AIs have become
available for code generation (such as ChatGPT and StarCoder) and are adopted
widely. It is often desirable to know whether a piece of code is generated by
AI, and furthermore, which AI is the author. For instance, if a certain version
of AI is known to generate vulnerable codes, it is particularly important to
know the creator. Watermarking is broadly considered a promising solution and
is successfully applied for identifying AI-generated text. However, existing
efforts on watermarking AI-generated codes are far from ideal, and pose more
challenges than watermarking general text due to limited flexibility and
encoding space. In this work, we propose ACW (AI Code Watermarking), a novel
method for watermarking AI-generated codes. The key idea of ACW is to
selectively apply a set of carefully-designed semantic-preserving, idempotent
code transformations, whose presence (or absence) allows us to determine the
existence of watermarks. It is efficient as it requires no training or
fine-tuning and works in a black-box manner. Our experimental results show that
ACW is effective (i.e., achieving high accuracy on detecting AI-generated codes
and extracting watermarks) as well as resilient, significantly outperforming
existing approaches.