Provably Robust Adversarial Examples

Authors: Dimitar I. Dimitrov, Gagandeep Singh, Timon Gehr, Martin Vechev | Published: 2020-07-23 | Updated: 2022-03-17

2020.07.232025.04.03

Authors: Dimitar I. Dimitrov, Gagandeep Singh, Timon Gehr, Martin Vechev
Published: 2020-07-23 | Updated: 2022-03-17

Source: https://arxiv.org/abs/2007.12133

PDF: https://arxiv.org/pdf/2007.12133

AIにより推定されたラベル

性能評価敵対的サンプル深層学習

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

We introduce the concept of provably robust adversarial examples for deep neural networks – connected input regions constructed from standard adversarial examples which are guaranteed to be robust to a set of real-world perturbations (such as changes in pixel intensity and geometric transformations). We present a novel method called PARADE for generating these regions in a scalable manner which works by iteratively refining the region initially obtained via sampling until a refined region is certified to be adversarial with existing state-of-the-art verifiers. At each step, a novel optimization procedure is applied to maximize the region’s volume under the constraint that the convex relaxation of the network behavior with respect to the region implies a chosen bound on the certification objective. Our experimental evaluation shows the effectiveness of PARADE: it successfully finds large provably robust regions including ones containing ≈ 10⁵⁷³ adversarial examples for pixel intensity and ≈ 10⁵⁹⁹ for geometric perturbations. The provability enables our robust examples to be significantly more effective against state-of-the-art defenses based on randomized smoothing than the individual attacks used to construct the regions.