A Theoretical View of Linear Backpropagation and Its Convergence

TOP Literature Database A Theoretical View of Linear Backpropagation and Its Convergence

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2112.11018

PDF

https://arxiv.org/pdf/2112.11018

Paper Information

Author: Ziang Li;Yiwen Guo;Haodi Liu;Changshui Zhang
Published: 12-21-2021
Updated: 1-10-2024
Affiliation: Institute for Artificial Intelligence, Tsinghua University, State Key Lab of Intelligent Technologies and Systems, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University
Country: China
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Convergence Analysis Model Design Defense Method

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Backpropagation (BP) is widely used for calculating gradients in deep neural networks (DNNs). Applied often along with stochastic gradient descent (SGD) or its variants, BP is considered as a de-facto choice in a variety of machine learning tasks including DNN training and adversarial attack/defense. Recently, a linear variant of BP named LinBP was introduced for generating more transferable adversarial examples for performing black-box attacks, by Guo et al. Although it has been shown empirically effective in black-box attacks, theoretical studies and convergence analyses of such a method is lacking. This paper serves as a complement and somewhat an extension to Guo et al.'s paper, by providing theoretical analyses on LinBP in neural-network-involved learning tasks, including adversarial attack and model training. We demonstrate that, somewhat surprisingly, LinBP can lead to faster convergence in these tasks in the same hyper-parameter settings, compared to BP. We confirm our theoretical results with extensive experiments.

External Datasets

MNIST

CIFAR-10

References

NeurIPS

Backpropagating linearly improves transferability of adversarial examples

Y. Guo, Q. Li, H. Chen

Published: 2020

3rd International Conference on Learning Representations

Very deep convolutional networks for large-scale image recognition

K. Simonyan, A. Zisserman

Published: 2015

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Published: 2016

CVPR

Densely connected convolutional networks

G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger

Published: 2017

Advances in Neural Information Processing Systems

Attention is all you need

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, I. Polosukhin

Published: 2017

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.

Published: 2021

A method for stochastic optimization

Jimmy Ba, Diederik P. Kingma

Published: 2014

International Conference on Learning Representations

Decoupled weight decay regularization

Ilya Loshchilov, Frank Hutter

Published: 2018

ICLR

Sgdr: Stochastic gradient descent with warm restarts

I. Loshchilov, F. Hutter

Published: 2017

On the convergence of adam and beyond

S. J. Reddi, S. Kale, S. Kumar

Published: 2019

ICML

On the importance of initialization and momentum in deep learning

I. Sutskever, J. Martens, G. Dahl, G. Hinton

Published: 2013

Proceedings of COMPSTAT’2010. Springer

Large-scale machine learning with stochastic gradient descent

L. Bottou

Published: 2010

Proceedings of the 1988 connectionist models summer school

A theoretical framework for back-propagation

Y. LeCun

Published: 1988

ACM AsiACCS

Practical black-box attacks against machine learning

Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., Swami, A.

Published: 2017

CVPR

Mobilenetv2: Inverted residuals and linear bottlenecks

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen

Published: 2018

arXiv

Wide residual networks

Sergey Zagoruyko, Nikos Komodakis

Published: 2016

ICLR

Explaining and harnessing adversarial examples

Goodfellow, I. J., Shlens, J., Szegedy, C.

Published: 2015

arxiv

Cited by 1

International Conference on Learning Representations (ICLR)

Adversarial Machine Learning at Scale

Alexey Kurakin, Ian Goodfellow, Samy Bengio

Published: 11.4.2016

Adversarial examples are malicious inputs designed to fool machine learning models. They often transfer from one model to another, allowing attackers to mount black box attacks without knowledge of the target model's parameters. Adversarial training is the process of explicitly training a model on adversarial examples, in order to make it more robust to attack or to reduce its test error on clean inputs. So far, adversarial training has primarily been applied to small problems. In this research, we apply adversarial training to ImageNet. Our contributions include: (1) recommendations for how to succesfully scale adversarial training to large models and datasets, (2) the observation that adversarial training confers robustness to single-step attack methods, (3) the finding that multi-step attack methods are somewhat less transferable than single-step attack methods, so single-step attacks are the best for mounting black-box attacks, and (4) resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples, because the adversarial example construction process uses the true label and the model can learn to exploit regularities in the construction process.

Robustness of Deep Networks Certified Robustness Adversarial Learning

arxiv

Cited by 45

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu

Published: 6.20.2017

Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. Code and pre-trained models are available at https://github.com/MadryLab/mnist_challenge and https://github.com/MadryLab/cifar10_challenge.

Certified Robustness Adversarial Example Robustness Evaluation

International conference on machine learning

Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

Anish Athalye, Nicholas Carlini, David Wagner

Published: 2018

CVPR

Deepfool: a simple and accurate method to fool deep neural networks

S.-M. Moosavi-Dezfooli, A. Fawzi, P. Frossard

Published: 2016

arxiv

Cited by 1

IEEE Symposium on Security and Privacy

Towards Evaluating the Robustness of Neural Networks

Nicholas Carlini, David Wagner

Published: 8.17.2016

Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input $x$ and any target classification $t$, it is possible to find a new input $x'$ that is similar to $x$ but classified as $t$. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from $95\%$ to $0.5\%$. In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with $100\%$ probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.

Model Robustness Adversarial Example Certified Robustness

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton

Published: 2009

IJCV

Imagenet large scale visual recognition challenge

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.

Published: 2015

arXiv

Estimating or propagating gradients through stochastic neurons for conditional computation

Y. Bengio, N. Leonard, A. Courville

Published: 2013

ICML

Gradient descent finds global minima of deep neural networks

S. Du, J. Lee, H. Li, L. Wang, X. Zhai

Published: 2019

ICML

Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks

S. Arora, S. Du, W. Hu, Z. Li, R. Wang

Published: 2019

ICML

An analytical formula of population gradient for two-layered relu network and its applications in convergence and critical point analysis

Y. Tian

Published: 2017

ICLR

Gradient descent provably optimizes over-parameterized neural networks

S. S. Du, X. Zhai, B. Poczos, A. Singh

Published: 2019

Advances in Neural Information Processing Systems

PyTorch: An Imperative Style, High-Performance Deep Learning Library

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala

Published: 2019

ICLR

Geometry-aware instance-reweighted adversarial training

J. Zhang, J. Zhu, G. Niu, B. Han, M. Sugiyama, M. Kankanhalli

Published: 2021

Advances in Neural Information Processing Systems

RobustBench: a standardized adversarial robustness benchmark

F. Croce, M. Andriushchenko, V. Sehwag, E. Debenedetti, N. Flammarion, M. Chiang, P. Mittal, M. Hein

Published: 2021

Proceedings of the IEEE

Gradient-based learning applied to document recognition

Y. LeCun, L. Bottou, Y. Bengio, P. Haffner

Published: 1998

arxiv

Cited by 19

International Conference on Machine Learning (ICML)

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

Francesco Croce, Matthias Hein

Published: 3.4.2020

The field of defense strategies against adversarial attacks has significantly grown over the last years, but progress is hampered as the evaluation of adversarial defenses is often insufficient and thus gives a wrong impression of robustness. Many promising defenses could be broken later on, making it difficult to identify the state-of-the-art. Frequent pitfalls in the evaluation are improper tuning of hyperparameters of the attacks, gradient obfuscation or masking. In this paper we first propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness. We apply our ensemble to over 50 models from papers published at recent top machine learning and computer vision venues. In all except one of the cases we achieve lower robust test accuracy than reported in these papers, often by more than $10\%$, identifying several broken defenses.

Adversarial Perturbation Techniques Robustness Evaluation Defense Method

An exponential learning rate schedule for deep learning

Z. Li, S. Arora

Published: 2019