With False Friends Like These, Who Can Notice Mistakes?

TOP 文献データベース With False Friends Like These, Who Can Notice Mistakes?

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2012.14738

PDF

https://arxiv.org/pdf/2012.14738

文献情報

作者: Lue Tao;Lei Feng;Jinfeng Yi;Songcan Chen
公開日: 2020-12-29
更新日: 2021-12-14
所属機関: College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics
所属の国: China
会議名: AAAI Conference on Artificial Intelligence (AAAI)

AIにより推定されたラベル

敵対的サンプル敵対的学習防御メカニズム

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Adversarial examples crafted by an explicit adversary have attracted significant attention in machine learning. However, the security risk posed by a potential false friend has been largely overlooked. In this paper, we unveil the threat of hypocritical examples -- inputs that are originally misclassified yet perturbed by a false friend to force correct predictions. While such perturbed examples seem harmless, we point out for the first time that they could be maliciously used to conceal the mistakes of a substandard (i.e., not as good as required) model during an evaluation. Once a deployer trusts the hypocritical performance and applies the "well-performed" model in real-world applications, unexpected failures may happen even in benign environments. More seriously, this security risk seems to be pervasive: we find that many types of substandard models are vulnerable to hypocritical examples across multiple datasets. Furthermore, we provide the first attempt to characterize the threat with a metric called hypocritical risk and try to circumvent it via several countermeasures. Results demonstrate the effectiveness of the countermeasures, while the risk remains non-negligible even after adaptive robust training.