On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective

TOP 文献データベース On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2409.06130

PDF

https://arxiv.org/pdf/2409.06130

文献情報

作者: Aoting Hu;Yanzhi Chen;Renjie Xie;Adrian Weller
公開日: 2024-9-10
所属機関: Anhui University of Technology
所属の国: China
会議名

AIにより推定されたラベル

透かしの耐久性ウォーターマーキング攻撃手法

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Safeguarding the intellectual property of machine learning models has emerged as a pressing concern in AI security. Model watermarking is a powerful technique for protecting ownership of machine learning models, yet its reliability has been recently challenged by recent watermark removal attacks. In this work, we investigate why existing watermark embedding techniques particularly those based on backdooring are vulnerable. Through an information-theoretic analysis, we show that the resilience of watermarking against erasure attacks hinges on the choice of trigger-set samples, where current uses of out-distribution trigger-set are inherently vulnerable to white-box adversaries. Based on this discovery, we propose a novel model watermarking scheme, In-distribution Watermark Embedding (IWE), to overcome the limitations of existing method. To further minimise the gap to clean models, we analyze the role of logits as watermark information carriers and propose a new approach to better conceal watermark information within the logits. Experiments on real-world datasets including CIFAR-100 and Caltech-101 demonstrate that our method robustly defends against various adversaries with negligible accuracy loss (< 0.1%).