These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Deep neural networks (DNNs) have proven to be powerful tools for processing
unstructured data. However for high-dimensional data, like images, they are
inherently vulnerable to adversarial attacks. Small almost invisible
perturbations added to the input can be used to fool DNNs. Various attacks,
hardening methods and detection methods have been introduced in recent years.
Notoriously, Carlini-Wagner (CW) type attacks computed by iterative
minimization belong to those that are most difficult to detect. In this work we
outline a mathematical proof that the CW attack can be used as a detector
itself. That is, under certain assumptions and in the limit of attack
iterations this detector provides asymptotically optimal separation of original
and attacked images. In numerical experiments, we experimentally validate this
statement and furthermore obtain AUROC values up to 99.73% on CIFAR10 and
ImageNet. This is in the upper part of the spectrum of current state-of-the-art
detection rates for CW attacks.