The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

International Conference on Learning Representations

Uncertainty sets for image classifiers using conformal prediction

A.N. Angelopoulos, S. Bates, M. Jordan, J. Malik

Published: 2020

arxiv

Cited by 19

International Conference on Machine Learning (ICML)

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

Francesco Croce, Matthias Hein

Published: 3.4.2020

The field of defense strategies against adversarial attacks has significantly grown over the last years, but progress is hampered as the evaluation of adversarial defenses is often insufficient and thus gives a wrong impression of robustness. Many promising defenses could be broken later on, making it difficult to identify the state-of-the-art. Frequent pitfalls in the evaluation are improper tuning of hyperparameters of the attacks, gradient obfuscation or masking. In this paper we first propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness. We apply our ensemble to over 50 models from papers published at recent top machine learning and computer vision venues. In all except one of the cases we achieve lower robust test accuracy than reported in these papers, often by more than $10\%$, identifying several broken defenses.

Adversarial Perturbation Techniques Robustness Evaluation Defense Method

International Conference on Computer Vision

Learnable boundary guided adversarial training

J. Cui, S. Liu, L. Wang, J. Jia

Published: 2021

IEEE transactions on neural networks and learning systems

Accelerating monte carlo bayesian prediction via approximating predictive uncertainty over the simplex

Cui, Y., Yao, W., Li, Q., Chan, A. B., Xue, C. J.

Published: 2020

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Bayesian nested neural networks for uncertainty calibration and adaptive compression

Cui, Y., Liu, Z., Li, Q., Chan, A. B., Xue, C. J.

Published: 2021

The Eleventh International Conference on Learning Representations

Bayes-MIL: A new probabilistic perspective on attention-based multiple instance learning for whole slide images

CUI, Y., Liu, Z., Liu, X., Liu, X., Wang, C., Kuo, T.-W., Xue, C. J., Chan, A. B.

Published: 2023

IEEE Transactions on Pattern Analysis and Machine Intelligence

Variational nested dropout

Cui, Y., Mao, Y., Liu, Z., Li, Q., Chan, A. B., Liu, X., Kuo, T.-W., Xue, C. J.

Published: 2023

Advances in Neural Information Processing Systems

Training uncertainty-aware classifiers with conformalized deep learning

Einbinder, B.-S., Romano, Y., Sesia, M., Zhou, Y.

Published: 2022

arxiv

Cited by 6

International Conference on Machine Learning (ICML)

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

Yarin Gal, Zoubin Ghahramani

Published: 6.6.2015

Deep learning tools have gained tremendous attention in applied machine learning. However such tools for regression and classification do not capture model uncertainty. In comparison, Bayesian models offer a mathematically grounded framework to reason about model uncertainty, but usually come with a prohibitive computational cost. In this paper we develop a new theoretical framework casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes. A direct result of this theory gives us tools to model uncertainty with dropout NNs -- extracting information from existing models that has been thrown away so far. This mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy. We perform an extensive study of the properties of dropout's uncertainty. Various network architectures and non-linearities are assessed on tasks of regression and classification, using MNIST as an example. We show a considerable improvement in predictive log-likelihood and RMSE compared to existing state-of-the-art methods, and finish by using dropout's uncertainty in deep reinforcement learning.

Quantification of Uncertainty Deep Learning Method Bayesian Optimization

International Conference on Learning Representations

Adversarially robust conformal prediction

Gendler, A., Weng, T.-W., Daniel, L., Romano, Y.

Published: 2021

Uncertainty in Artificial Intelligence

Probabilistically robust conformal prediction

Ghosh, S., Shi, Y., Belkhouja, T., Yan, Y., Doppa, J., Jones, B.

Published: 2023

Advances in Neural Information Processing Systems

Adaptive conformal inference under distribution shift

Gibbs, I., Candes, E.

Published: 2021

Advances in neural information processing systems

Semi-supervised learning by entropy minimization

Grandvalet, Y., Bengio, Y.

Published: 2004

Caltech-256 object category dataset

Griffin, G., Holub, A., Perona, P.

Published: 2007

International conference on machine learning

On calibration of modern neural networks

Guo, C., Pleiss, G., Sun, Y., Weinberger, K. Q.

Published: 2017

Nature

Array programming with NumPy

C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, J. F. del R´ıo, M. Wiebe, P. Peterson, P. Gerard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, T. E. Oliphant

Published: 2020

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Published: 2016

Advances in neural information processing systems

What uncertainties do we need in bayesian deep learning for computer vision?

Kendall, A., Gal, Y.

Published: 2017

Advances in neural information processing systems

Variational dropout and the local reparameterization trick

D. P. Kingma, T. Salimans, M. Welling

Published: 2015

Uncertainty in Artificial Intelligence

On the effectiveness of adversarial training against common corruptions

Kireev, K., Andriushchenko, M., Flammarion, N.

Published: 2022

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton

Published: 2009

Journal of the American Statistical Association

Distribution-free predictive inference for regression

J. Lei, M. G’Sell, A. Rinaldo, R. J. Tibshirani, L. Wasserman

Published: 2018

Proceedings of the IEEE international conference on computer vision

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár

Published: 2017

Advances in Neural Information Processing Systems

Probabilistic margins for instance reweighting in adversarial training

Liu, F., Han, B., Liu, T., Gong, C., Niu, G., Zhou, M., Sugiyama, M.

Published: 2021

arxiv

Cited by 1

British Machine Vision Conference (BMVC)

Boosting Adversarial Robustness From The Perspective of Effective Margin Regularization

Ziquan Liu, Antoni B. Chan

Published: 10.11.2022

The adversarial vulnerability of deep neural networks (DNNs) has been actively investigated in the past several years. This paper investigates the scale-variant property of cross-entropy loss, which is the most commonly used loss function in classification tasks, and its impact on the effective margin and adversarial robustness of deep neural networks. Since the loss function is not invariant to logit scaling, increasing the effective weight norm will make the loss approach zero and its gradient vanish while the effective margin is not adequately maximized. On typical DNNs, we demonstrate that, if not properly regularized, the standard training does not learn large effective margins and leads to adversarial vulnerability. To maximize the effective margins and learn a robust DNN, we propose to regularize the effective weight norm during training. Our empirical study on feedforward DNNs demonstrates that the proposed effective margin regularization (EMR) learns large effective margins and boosts the adversarial robustness in both standard and adversarial training. On large-scale models, we show that EMR outperforms basic adversarial training, TRADES and two regularization baselines with substantial improvement. Moreover, when combined with several strong adversarial defense methods (MART and MAIL), our EMR further boosts the robustness.

Poisoning Adversarial attack Performance Evaluation Metrics

ICML 2021 Workshop on Adversarial Machine Learning

Improve generalization and robustness of neural networks via weight scale shifting invariant regularizations

Liu, Z., Yufei, C., Chan, A. B.

Published: 2021

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Twins: A fine-tuning framework for improved transferability of adversarial robustness and generalization

Liu, Z., Xu, Y., Ji, X., Chan, A. B.

Published: 2023

arxiv

Cited by 45

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu

Published: 6.20.2017

Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. Code and pre-trained models are available at https://github.com/MadryLab/mnist_challenge and https://github.com/MadryLab/cifar10_challenge.

Certified Robustness Adversarial Example Robustness Evaluation

Advances in neural information processing systems

When does label smoothing help?

Muller, R., Kornblith, S., Hinton, G. E.

Published: 2019

Machine Learning: European Conference on Machine Learning

Inductive confidence machines for regression

H. Papadopoulos, K. Proedrou, V. Vovk, A. Gammerman

Published: 2002

The Annals of Mathematical Statistics

On estimation of a probability density function and mode

Emanuel Parzen

Published: 1962

Advances in large margin classifiers

Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods

Platt, J.

Published: 1999

arxiv

Cited by 1

Conference on Neural Information Processing Systems (NeurIPS)

Adversarial Robustness through Local Linearization

Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, Pushmeet Kohli

Published: 7.5.2019

Adversarial training is an effective methodology for training deep neural networks that are robust against adversarial, norm-bounded perturbations. However, the computational cost of adversarial training grows prohibitively as the size of the model and number of input dimensions increase. Further, training against less expensive and therefore weaker adversaries produces models that are robust against weak attacks but break down under attacks that are stronger. This is often attributed to the phenomenon of gradient obfuscation; such models have a highly non-linear loss surface in the vicinity of training examples, making it hard for gradient-based attacks to succeed even though adversarial examples still exist. In this work, we introduce a novel regularizer that encourages the loss to behave linearly in the vicinity of the training data, thereby penalizing gradient obfuscation while encouraging robustness. We show via extensive experiments on CIFAR-10 and ImageNet, that models trained with our regularizer avoid gradient obfuscation and can be trained significantly faster than adversarial training. Using this regularizer, we exceed current state of the art and achieve 47% adversarial accuracy for ImageNet with l-infinity adversarial perturbations of radius 4/255 under an untargeted, strong, white-box attack. Additionally, we match state of the art results for CIFAR-10 at 8/255.

Adversarial attack Deep Learning Method Robustness Evaluation

Advances in Neural Information Processing Systems

Improving calibration through the relationship with adversarial robustness

Qin, Y., Wang, X., Beutel, A., Chi, E.

Published: 2021

Classification in BioApps: Automation of Decision Making

Deep learning for medical image processing: Overview, challenges and the future

Razzak, M. I., Naz, S., Zaib, A.

Published: 2018

Advances in Neural Information Processing Systems

Classification with valid and adaptive coverage

Romano, Y., Sesia, M., Candes, E.

Published: 2020

The annals of mathematical statistics

Remarks on some nonparametric estimates of a density function

Rosenblatt, M.

Published: 1956

NeurIPS

Do adversarially robust ImageNet models transfer better?

H. Salman, A. Ilyas, L. Engstrom, A. Kapoor, A. Madry

Published: 2020

Journal of Machine Learning Research

A tutorial on conformal prediction

G. Shafer, V. Vovk

Published: 2008

arxiv

Cited by 1

International Conference on Machine Learning (ICML)

Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks

David Stutz, Matthias Hein, Bernt Schiele

Published: 10.15.2019

Adversarial training yields robust models against a specific threat model, e.g., $L_\infty$ adversarial examples. Typically robustness does not generalize to previously unseen threat models, e.g., other $L_p$ norms, or larger perturbations. Our confidence-calibrated adversarial training (CCAT) tackles this problem by biasing the model towards low confidence predictions on adversarial examples. By allowing to reject examples with low confidence, robustness generalizes beyond the threat model employed during training. CCAT, trained only on $L_\infty$ adversarial examples, increases robustness against larger $L_\infty$, $L_2$, $L_1$ and $L_0$ attacks, adversarial frames, distal adversarial examples and corrupted examples and yields better clean accuracy compared to adversarial training. For thorough evaluation we developed novel white- and black-box attacks directly attacking CCAT by maximizing confidence. For each threat model, we use $7$ attacks with up to $50$ restarts and $5000$ iterations and report worst-case robust test error, extended to our confidence-thresholded setting, across all attacks.

Adversarial Attack Methods Poisoning Attack Evaluation

Proceedings of the Sixteenth International Conference on Machine Learning

Machine-learning applications of algorithmic randomness

V. Vovk, A. Gammerman, C. Saunders

Published: 1999

Springer

Algorithmic Learning in a Random World

V. Vovk, A. Gammerman, G. Shafer

Published: 2005

The caltech-ucsd birds-200-2011 dataset

Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.

Published: 2011

ICLR

Improving adversarial robustness requires revisiting misclassified examples

Y. Wang, D. Zou, J. Yi, J. Bailey, X. Ma, Q. Gu

Published: 2019

arxiv

Cited by 1

To be Robust or to be Fair: Towards Fairness in Adversarial Training

Han Xu, Xiaorui Liu, Yaxin Li, Anil K. Jain, Jiliang Tang

Published: 10.13.2020

Adversarial training algorithms have been proved to be reliable to improve machine learning models' robustness against adversarial examples. However, we find that adversarial training algorithms tend to introduce severe disparity of accuracy and robustness between different groups of data. For instance, a PGD adversarially trained ResNet18 model on CIFAR-10 has 93% clean accuracy and 67% PGD l-infty-8 robust accuracy on the class "automobile" but only 65% and 17% on the class "cat". This phenomenon happens in balanced datasets and does not exist in naturally trained models when only using clean samples. In this work, we empirically and theoretically show that this phenomenon can happen under general adversarial training algorithms which minimize DNN models' robust errors. Motivated by these findings, we propose a Fair-Robust-Learning (FRL) framework to mitigate this unfairness problem when doing adversarial defenses. Experimental results validate the effectiveness of FRL.

Adversarial Training Ensuring Fairness Bias Mitigation Techniques

International Conference on Machine Learning

Theoretically principled trade-off between robustness and accuracy

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, Michael Jordan

Published: 2019