Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks

TOP 文献データベース Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1910.06259

PDF

https://arxiv.org/pdf/1910.06259

文献情報

作者: David Stutz,Matthias Hein,Bernt Schiele
公開日: 2019-10-15
更新日: 2020-6-30
所属機関: Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrucken
所属の国: Germany
会議名: International Conference on Machine Learning (ICML)

AIにより推定されたラベル

敵対的攻撃手法ポイズニング攻撃の評価

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Adversarial training yields robust models against a specific threat model, e.g., $L_\infty$ adversarial examples. Typically robustness does not generalize to previously unseen threat models, e.g., other $L_p$ norms, or larger perturbations. Our confidence-calibrated adversarial training (CCAT) tackles this problem by biasing the model towards low confidence predictions on adversarial examples. By allowing to reject examples with low confidence, robustness generalizes beyond the threat model employed during training. CCAT, trained only on $L_\infty$ adversarial examples, increases robustness against larger $L_\infty$, $L_2$, $L_1$ and $L_0$ attacks, adversarial frames, distal adversarial examples and corrupted examples and yields better clean accuracy compared to adversarial training. For thorough evaluation we developed novel white- and black-box attacks directly attacking CCAT by maximizing confidence. For each threat model, we use $7$ attacks with up to $50$ restarts and $5000$ iterations and report worst-case robust test error, extended to our confidence-thresholded setting, across all attacks.