The poison of dimensionality

Advances in Neural Information Processing Systems

Robust distributed learning: tight error bounds and breakdown point under data heterogeneity

Allouah, Y., Guerraoui, R., Gupta, N., Pinot, R., Rizk, G.

Published: 2024

International Conference on Artificial Intelligence and Statistics. PMLR

Robust training in high dimensions via block coordinate geometric median descent

Anish Acharya, Abolfazl Hashemi, Prateek Jain, Sujay Sanghavi, Inderjit S Dhillon, Ufuk Topcu

Published: 2022

Forbidden Stories

“team jorge”: In the heart of a global disinformation machine

C´ecile Andrzejewski

Published: 2023

Proc. of ICASSP

Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff

Eitan Borgnia, Valeriia Cherepanova, Liam Fowl, Amin Ghiasi, Jonas Geiping, Micah Goldblum, Tom Goldstein, Arjun Gupta

Published: 2021

FAccT

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell

Proceedings of the National Academy of Sciences

Reconciling modern machine-learning practice and the classical bias–variance trade-off

M. Belkin, D. Hsu, S. Ma, S. Mandal

Published: 2019

SIAM Journal on Mathematics of Data Science

Two models of double descent for weak features

M. Belkin, D. Hsu, J. Xu

Published: 2020

Advances in neural information processing systems

Machine learning with adversaries: Byzantine tolerant gradient descent

Blanchard, P., El Mhamdi, E. M., Guerraoui, R., Stainer, J.

Published: 2017

OpenAI Technical Report

Language models are few-shot learners

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei

Published: 2020

Proceedings of the 29th International Conference on Machine Learning

Poisoning attacks against support vector machines

Battista Biggio, Blaine Nelson, Pavel Laskov

Published: 2012

arxiv

Cited by 13

Pattern Recognit.

Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning

Battista Biggio, Fabio Roli

Published: 12.9.2017

Learning-based pattern classifiers, including deep networks, have shown impressive performance in several application domains, ranging from computer vision to cybersecurity. However, it has also been shown that adversarial input perturbations carefully crafted either at training or at test time can easily subvert their predictions. The vulnerability of machine learning to such wild patterns (also referred to as adversarial examples), along with the design of suitable countermeasures, have been investigated in the research field of adversarial machine learning. In this work, we provide a thorough overview of the evolution of this research area over the last ten years and beyond, starting from pioneering, earlier work on the security of non-deep learning algorithms up to more recent work aimed to understand the security properties of deep learning algorithms, in the context of computer vision and cybersecurity tasks. We report interesting connections between these apparently-different lines of work, highlighting common misconceptions related to the security evaluation of machine-learning algorithms. We review the main threat models and attacks defined to this end, and discuss the main limitations of current work, along with the corresponding future challenges towards the design of more secure learning algorithms.

Adversarial Attack Methods Poisoning Adversarial Learning

Expert Syst. Appl.

Poisoning qos-aware cloud API recommender system with generative adversarial network attack

Zhen Chen, Taiyu Bao, Wenchao Qi, Dianlong You, Linlin Liu, Limin Shen

Published: 2024

Proceedings of the forty-eighth annual ACM symposium on Theory of Computing

Geometric median in nearly linear time

Michael B Cohen, Yin Tat Lee, Gary Miller, Jakub Pachocki, Aaron Sidford

Published: 2016

ICLR

Clean-image backdoor: Attacking multi-label models with poisoned labels only

Chen, K., Lou, X., Xu, G., Li, J., Zhang, T.

Published: 2022

CoRR

PaLM: Scaling Language Modeling with Pathways

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel

Published: 2022

Advances in Neural Information Processing Systems

Effective backdoor defense by exploiting sensitivity of poisoned samples

W. Chen, B. Wu, H. Wang

Published: 2022

Advances in neural information processing systems

Collaborative learning in the jungle (decentralized, byzantine, heterogeneous, asynchronous and nonconvex learning)

El-Mhamdi, E. M., Farhadkhani, S., Guerraoui, R., Guirguis, A., Hoang, L.-N., Rouault, S.

Published: 2021

Sok: On the impossible security of very large foundation models

El-Mahdi El-Mhamdi, Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Lê-Nguyên Hoang, Rafael Pinot, John Stephan

Published: 2022

International Conference on Artificial Intelligence and Statistics

On the strategyproofness of the geometric median

El-Mahdi El-Mhamdi, Sadegh Farhadkhani, Rachid Guerraoui, Lˆe-Nguyˆen Hoang

Published: 2023

arxiv

Cited by 1

International Conference on Machine Learning (ICML)

An Equivalence Between Data Poisoning and Byzantine Gradient Attacks

Sadegh Farhadkhani, Rachid Guerraoui, Lê-Nguyên Hoang, Oscar Villemaud

Published: 2.17.2022

To study the resilience of distributed learning, the "Byzantine" literature considers a strong threat model where workers can report arbitrary gradients to the parameter server. Whereas this model helped obtain several fundamental results, it has sometimes been considered unrealistic, when the workers are mostly trustworthy machines. In this paper, we show a surprising equivalence between this model and data poisoning, a threat considered much more realistic. More specifically, we prove that every gradient attack can be reduced to data poisoning, in any personalized federated learning system with PAC guarantees (which we show are both desirable and realistic). This equivalence makes it possible to obtain new impossibility results on the resilience of any "robust" learning algorithm to data poisoning in highly heterogeneous applications, as corollaries of existing impossibility theorems on Byzantine machine learning. Moreover, using our equivalence, we derive a practical attack that we show (theoretically and empirically) can be very effective against classical personalized federated learning models.

Data Contamination Detection Poisoning Convergence Analysis

NeurIPS

Is out-of-distribution detection learnable?

Zhen Fang, Yixuan Li, Jie Lu, Jiahua Dong, Bo Han, Feng Liu

Published: 2022

7th IEEE International Conference on Data Science in Cyberspace

A survey on data poisoning attacks and defenses

Jiaxin Fan, Qi Yan, Mohan Li, Guanqun Qu, Yang Xiao

Published: 2022

J. Mach. Learn. Res.

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity

William Fedus, Barret Zoph, Noam Shazeer

Published: 2022

Neural computation

Neural networks and the bias/variance dilemma

Stuart Geman, Elie Bienenstock, Ren´e Doursat

Published: 1992

2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)

Planting undetectable backdoors in machine learning models

Shafi Goldwasser, Michael P Kim, Vinod Vaikuntanathan, Or Zamir

Published: 2022

Artif. Intell. Rev.

A survey of outlier detection methodologies

Victoria J. Hodge, Jim Austin

Published: 2004

CoRR

On the effectiveness of mitigating data poisoning attacks with gradient shaping

Sanghyun Hong, Varun Chandrasekaran, Yigitcan Kaya, Tudor Dumitras, Nicolas Papernot

Published: 2020

arxiv

Cited by 2

Computing Research Repository (CoRR)

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez

Published: 1.11.2024

Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept examples of deceptive behavior in large language models (LLMs). For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it). The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning about deceiving the training process, with the persistence remaining even when the chain-of-thought is distilled away. Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.

Reinforcement Learning Prompt Injection Backdoor Attack

Advances in Neural Information Processing Systems 33

Metapoi son: Practical general-purpose clean-label data poisoning

W. Ronny Huang, Jonas Geiping, Liam Fowl, Gavin Taylor, Tom Goldstein

Published: 2020

Annals of statistics

Surprises in high-dimensional ridgeless least squares interpolation

Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J Tibshirani

Published: 2022

arxiv

Cited by 1

Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks

Jinyuan Jia, Xiaoyu Cao, Neil Zhenqiang Gong

Published: 8.11.2020

In a \emph{data poisoning attack}, an attacker modifies, deletes, and/or inserts some training examples to corrupt the learnt machine learning model. \emph{Bootstrap Aggregating (bagging)} is a well-known ensemble learning method, which trains multiple base models on random subsamples of a training dataset using a base learning algorithm and uses majority vote to predict labels of testing examples. We prove the intrinsic certified robustness of bagging against data poisoning attacks. Specifically, we show that bagging with an arbitrary base learning algorithm provably predicts the same label for a testing example when the number of modified, deleted, and/or inserted training examples is bounded by a threshold. Moreover, we show that our derived threshold is tight if no assumptions on the base learning algorithm are made. We evaluate our method on MNIST and CIFAR10. For instance, our method achieves a certified accuracy of $91.1\%$ on MNIST when arbitrarily modifying, deleting, and/or inserting 100 training examples. Code is available at: \url{https://github.com/jjy1994/BaggingCertifyDataPoisoning}.

Backdoor Attack Poisoning Attack Group-Based Robustness

Advances in Neural Information Processing Systems 34

Gradient inversion with generative image prior

Jinwoo Jeon, Jaechang Kim, Kangwook Lee, Sewoong Oh, Jungseul Ok

Published: 2021

arxiv

Cited by 1

IEEE Symposium on Security and Privacy

Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning

Matthew Jagielski, Alina Oprea, Battista Biggio, Chang Liu, Cristina Nita-Rotaru, Bo Li

Published: 4.2.2018

As machine learning becomes widely used for automated decisions, attackers have strong incentives to manipulate the results and models generated by machine learning algorithms. In this paper, we perform the first systematic study of poisoning attacks and their countermeasures for linear regression models. In poisoning attacks, attackers deliberately influence the training data to manipulate the results of a predictive model. We propose a theoretically-grounded optimization framework specifically designed for linear regression and demonstrate its effectiveness on a range of datasets and models. We also introduce a fast statistical attack that requires limited knowledge of the training process. Finally, we design a new principled defense method that is highly resilient against all poisoning attacks. We provide formal guarantees about its convergence and an upper bound on the effect of poisoning attacks when the defense is deployed. We evaluate extensively our attacks and defenses on three realistic datasets from health care, loan assessment, and real estate domains.

Detection of Poisonous Data Poisoning Loss Function

arxiv

Cited by 1

Annual ACM Conference on Computer and Communications Security (CCS)

Subpopulation Data Poisoning Attacks

Matthew Jagielski, Giorgio Severi, Niklas Pousette Harger, Alina Oprea

Published: 6.25.2020

Machine learning systems are deployed in critical settings, but they might fail in unexpected ways, impacting the accuracy of their predictions. Poisoning attacks against machine learning induce adversarial modification of data used by a machine learning algorithm to selectively change its output when it is deployed. In this work, we introduce a novel data poisoning attack called a \emph{subpopulation attack}, which is particularly relevant when datasets are large and diverse. We design a modular framework for subpopulation attacks, instantiate it with different building blocks, and show that the attacks are effective for a variety of datasets and machine learning models. We further optimize the attacks in continuous domains using influence functions and gradient optimization methods. Compared to existing backdoor poisoning attacks, subpopulation attacks have the advantage of inducing misclassification in naturally distributed data points at inference time, making the attacks extremely stealthy. We also show that our attack strategy can be used to improve upon existing targeted attacks. We prove that, under some assumptions, subpopulation attacks are impossible to defend against, and empirically demonstrate the limitations of existing defenses against our attacks, highlighting the difficulty of protecting machine learning against this threat.

Poisoning Attack Poisoning Backdoor Attack

arxiv

Cited by 1

International Conference on Learning Representations (ICLR)

Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing

Sai Praneeth Karimireddy, Lie He, Martin Jaggi

Published: 6.17.2020

In Byzantine robust distributed or federated learning, a central server wants to train a machine learning model over data distributed across multiple workers. However, a fraction of these workers may deviate from the prescribed algorithm and send arbitrary messages. While this problem has received significant attention recently, most current defenses assume that the workers have identical data. For realistic cases when the data across workers are heterogeneous (non-iid), we design new attacks which circumvent current defenses, leading to significant loss of performance. We then propose a simple bucketing scheme that adapts existing robust algorithms to heterogeneous datasets at a negligible computational cost. We also theoretically and experimentally validate our approach, showing that combining bucketing with existing robust algorithms is effective against challenging attacks. Our work is the first to establish guaranteed convergence for the non-iid Byzantine robust problem under realistic assumptions.

Algorithm Design Poisoning Non-Identical Dataset

Advances in Neural Information Processing Systems 36

Gradient descent with linearly correlated noise: Theory and applications to differential privacy

Anastasia Koloskova, Ryan McKenna, Zachary Charles, John Keith Rush, H. Brendan McMahan

Published: 2023

2020 IEEE Security and Privacy Workshops (SPW)

Adversarial machine learning-industry perspectives

Ram Shankar Siva Kumar, Magnus Nystrom, John Lambert, Andrew Marshall, Mario Goertzel, Andi Comissoneru, Matt Swann, Sharon Xia

Published: 2020

Machine Learning, Proceedings of the Thirteenth International Conference

Bias plus variance decomposition for zero-one loss functions

Ron Kohavi, David H. Wolpert

Published: 1996

Proceedings of the 35th International Conference on Machine Learning

Residual unfairness in fair machine learning from prejudiced data

Nathan Kallus, Angela Zhou

Published: 2018

The mnist database of handwritten digits

Deep Partition Aggregation: Provable Defense against General Poisoning Attacks

Alexander Levine, Soheil Feizi

Published: 6.26.2020

Adversarial poisoning attacks distort training data in order to corrupt the test-time behavior of a classifier. A provable defense provides a certificate for each test sample, which is a lower bound on the magnitude of any adversarial distortion of the training set that can corrupt the test sample's classification. We propose two novel provable defenses against poisoning attacks: (i) Deep Partition Aggregation (DPA), a certified defense against a general poisoning threat model, defined as the insertion or deletion of a bounded number of samples to the training set -- by implication, this threat model also includes arbitrary distortions to a bounded number of images and/or labels; and (ii) Semi-Supervised DPA (SS-DPA), a certified defense against label-flipping poisoning attacks. DPA is an ensemble method where base models are trained on partitions of the training set determined by a hash function. DPA is related to both subset aggregation, a well-studied ensemble method in classical machine learning, as well as to randomized smoothing, a popular provable defense against evasion attacks. Our defense against label-flipping attacks, SS-DPA, uses a semi-supervised learning algorithm as its base classifier model: each base classifier is trained using the entire unlabeled training set in addition to the labels for a partition. SS-DPA significantly outperforms the existing certified defense for label-flipping attacks on both MNIST and CIFAR-10: provably tolerating, for at least half of test images, over 600 label flips (vs. < 200 label flips) on MNIST and over 300 label flips (vs. 175 label flips) on CIFAR-10. Against general poisoning attacks, where no prior certified defenses exists, DPA can certify >= 50% of test images against over 500 poison image insertions on MNIST, and nine insertions on CIFAR-10. These results establish new state-of-the-art provable defenses against poisoning attacks.

Poisoning Defense Mechanism Algorithm Design

arxiv

Cited by 2

Exploring the Limits of Model-Targeted Indiscriminate Data Poisoning Attacks

Yiwei Lu, Gautam Kamath, Yaoliang Yu

Published: 3.7.2023

Indiscriminate data poisoning attacks aim to decrease a model's test accuracy by injecting a small amount of corrupted training data. Despite significant interest, existing attacks remain relatively ineffective against modern machine learning (ML) architectures. In this work, we introduce the notion of model poisoning reachability as a technical tool to explore the intrinsic limits of data poisoning attacks towards target parameters (i.e., model-targeted attacks). We derive an easily computable threshold to establish and quantify a surprising phase transition phenomenon among popular ML models: data poisoning attacks can achieve certain target parameters only when the poisoning ratio exceeds our threshold. Building on existing parameter corruption attacks and refining the Gradient Canceling attack, we perform extensive experiments to confirm our theoretical findings, test the predictability of our transition threshold, and significantly improve existing indiscriminate data poisoning baselines over a range of datasets and models. Our work highlights the critical role played by the poisoning ratio, and sheds new insights on existing empirical results, attacks and mitigation strategies in data poisoning.

Poisoning Attack Poisoning Data Contamination Detection

The Annals of Statistics

On the relation between s-estimators and m-estimators of multivariate location and covariance

Hendrik P Lopuhaa

Published: 1989

AAAI Conference on Artificial Intelligence

RSA: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets

Li, L., Xu, W., Chen, T., Giannakis, G. B., Ling, Q.

Published: 2019

KDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Persia: An open, hybrid system scaling deep learning-based recommenders up to 100 trillion parameters

Xiangru Lian, Binhang Yuan, Xuefeng Zhu, Yulong Wang, Yongjun He, Honghuan Wu, Lei Sun, Haodong Lyu, Chengjun Liu, Xing Dong, Yiqiao Liao, Mingnan Luo, Congfei Zhang, Jingru Xie, Haonan Li, Lei Chen, Renjie Huang, Jianying Lin, Chengchun Shu, Xuezhong Qiu, Zhishan Liu, Dongying Kong, Lei Yuan, Hai Yu, Sen Yang, Ce Zhang, Ji Liu

Published: 2022

arxiv

Cited by 1

International Conference on Machine Learning (ICML)

The Hidden Vulnerability of Distributed Learning in Byzantium

El Mahdi El Mhamdi, Rachid Guerraoui, Sébastien Rouault

Published: 2.22.2018

While machine learning is going through an era of celebrated success, concerns have been raised about the vulnerability of its backbone: stochastic gradient descent (SGD). Recent approaches have been proposed to ensure the robustness of distributed SGD against adversarial (Byzantine) workers sending poisoned gradients during the training phase. Some of these approaches have been proven Byzantine-resilient: they ensure the convergence of SGD despite the presence of a minority of adversarial workers. We show in this paper that convergence is not enough. In high dimension $d \gg 1$, an adver\-sary can build on the loss function's non-convexity to make SGD converge to ineffective models. More precisely, we bring to light that existing Byzantine-resilient schemes leave a margin of poisoning of $\Omega\left(f(d)\right)$, where $f(d)$ increases at least like $\sqrt{d~}$. Based on this leeway, we build a simple attack, and experimentally show its strong to utmost effectivity on CIFAR-10 and MNIST. We introduce Bulyan, and prove it significantly reduces the attackers leeway to a narrow $O( \frac{1}{\sqrt{d~}})$ bound. We empirically show that Bulyan does not suffer the fragility of existing aggregation rules and, at a reasonable cost in terms of required batch size, achieves convergence as if only non-Byzantine gradients had been used to update the model.

Machine Learning Method Poisoning Adversarial attack

Project Euclid

Geometric median and robust estimation in banach spaces

Stanislav Minsker

Published: 2015

2020 IEEE 25th Pacific Rim International Symposium on Dependable Computing (PRDC), IEEE

Data poisoning attacks on regression learning and corresponding defenses

N. M¨uller, D. Kowatsch, K. B¨ottinger

Published: 2020

Communications on Pure and Applied Mathematics

The generalization error of random features regression: Precise asymptotics and the double descent curve

Song Mei, Andrea Montanari

Published: 2022

IEEE International Symposium on Information Theory

Harmless interpolation of noisy data in regression

Vidya Muthukumar, Kailas Vodrahalli, Anant Sahai

Published: 2019

arxiv

Cited by 1

Data Poisoning against Differentially-Private Learners: Attacks and Defenses

Yuzhe Ma, Xiaojin Zhu, Justin Hsu

Published: 3.24.2019

Data poisoning attacks aim to manipulate the model produced by a learning algorithm by adversarially modifying the training set. We consider differential privacy as a defensive measure against this type of attack. We show that such learners are resistant to data poisoning attacks when the adversary is only able to poison a small number of items. However, this protection degrades as the adversary poisons more data. To illustrate, we design attack algorithms targeting objective and output perturbation learners, two standard approaches to differentially-private machine learning. Experiments show that our methods are effective when the attacker is allowed to poison sufficiently many training items.

Adversarial Attack Detection Detection of Poison Data for Backdoor Attacks Untargeted Toxicity Attack

8th International Conference on Learning Representations

Deep double descent: Where bigger models and more data hurt

Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever

Published: 2020

IEEE Trans. Signal Process.

Robust aggregation for federated learning

Krishna Pillutla, Sham M. Kakade, Za¨ıd Harchaoui

Published: 2022

Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security

Practical black-box attacks against machine learning

Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, Ananthram Swami

Published: 2017

Proc. Adv. Neural Inf. Process. Syst.

Random features for large-scale kernel machines

A. Rahimi, B. Recht

Published: 2007

arxiv

Cited by 1

International Conference on Machine Learning (ICML)

Certified Robustness to Label-Flipping Attacks via Randomized Smoothing

Elan Rosenfeld, Ezra Winston, Pradeep Ravikumar, J. Zico Kolter

Published: 2.8.2020

Machine learning algorithms are known to be susceptible to data poisoning attacks, where an adversary manipulates the training data to degrade performance of the resulting classifier. In this work, we present a unifying view of randomized smoothing over arbitrary functions, and we leverage this novel characterization to propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks. As a specific instantiation, we utilize our framework to build linear classifiers that are robust to a strong variant of label flipping, where each test example is targeted independently. In other words, for each test point, our classifier includes a certification that its prediction would be the same had some number of training labels been changed adversarially. Randomized smoothing has previously been used to guarantee---with high probability---test-time robustness to adversarial manipulation of the input to a classifier; we derive a variant which provides a deterministic, analytical bound, sidestepping the probabilistic certificates that traditionally result from the sampling subprocedure. Further, we obtain these certified bounds with minimal additional runtime complexity over standard classification and no assumptions on the train or test distributions. We generalize our results to the multi-class case, providing the first multi-class classification algorithm that is certifiably robust to label-flipping attacks.

Robustness Improvement Method Poisoning Continuous Linear Function

Proceedings of the USENIX Security Symposium

Glaze: Protecting artists from style mimicry by text-to-image models

Shawn Shan, Jenna Cryan, Emily Wenger, Haitao Zheng, Rana Hanocka, Ben Y. Zhao

Published: 2023

Advances in Neural Information Processing Systems 33

Election coding for distributed learning: Protecting signsgd against byzantine attacks

Jy-yong Sohn, Dong-Jun Han, Beongjun Choi, Jaekyun Moon

Published: 2020

arxiv

Cited by 1

International Conference on Machine Learning (ICML)

Model-Targeted Poisoning Attacks with Provable Convergence

Fnu Suya, Saeed Mahloujifar, Anshuman Suri, David Evans, Yuan Tian

Published: 6.30.2020

In a poisoning attack, an adversary with control over a small fraction of the training data attempts to select that data in a way that induces a corrupted model that misbehaves in favor of the adversary. We consider poisoning attacks against convex machine learning models and propose an efficient poisoning attack designed to induce a specified model. Unlike previous model-targeted poisoning attacks, our attack comes with provable convergence to {\it any} attainable target classifier. The distance from the induced classifier to the target classifier is inversely proportional to the square root of the number of poisoning points. We also provide a lower bound on the minimum number of poisoning points needed to achieve a given target classifier. Our method uses online convex optimization, so finds poisoning points incrementally. This provides more flexibility than previous attacks which require a priori assumption about the number of poisoning points. Our attack is the first model-targeted poisoning attack that provides provable convergence for convex models, and in our experiments, it either exceeds or matches state-of-the-art attacks in terms of attack success rate and distance to the target model.

Poisoning Backdoor Attack Attack Scenario Analysis

Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics

Dirt cheap web-scale parallel text from the common crawl

Jason R. Smith, Herve Saint-Amand, Magdalena Plamada, Philipp Koehn, Chris Callison-Burch, Adam Lopez

Published: 2013

ACM Computing Surveys

A comprehensive survey on poisoning attacks and countermeasures in machine learning

Z. Tian, L. Cui, J. Liang, S. Yu

Published: 2022

Commun. ACM

A theory of the learnable

Leslie G. Valiant

Published: 1984

Cambridge University Press

High-dimensional probability: An introduction with applications in data science

Roman Vershynin

Published: 2018

Cambridge university press

High-dimensional statistics: A non-asymptotic viewpoint.

Wainwright, M.J.

Published: 2019

Learning to invert: Simple adaptive attacks for gradient inversion in federated learning

Ruihan Wu, Xiangyu Chen, Chuan Guo, Kilian Q. Weinberger

Published: 2023

Proc. IEEE Wireless Comm. Net. Conf.

Defense strategies toward model poisoning attacks in federated learning: A survey

Z. Wang, Q. Kang, X. Zhang, Q. Hu

Published: 2022

arxiv

Cited by 3

Improved Certified Defenses against Data Poisoning with (Deterministic) Finite Aggregation

Wenxiao Wang, Alexander Levine, Soheil Feizi

Published: 2.6.2022

Data poisoning attacks aim at manipulating model behaviors through distorting training data. Previously, an aggregation-based certified defense, Deep Partition Aggregation (DPA), was proposed to mitigate this threat. DPA predicts through an aggregation of base classifiers trained on disjoint subsets of data, thus restricting its sensitivity to dataset distortions. In this work, we propose an improved certified defense against general poisoning attacks, namely Finite Aggregation. In contrast to DPA, which directly splits the training set into disjoint subsets, our method first splits the training set into smaller disjoint subsets and then combines duplicates of them to build larger (but not disjoint) subsets for training base classifiers. This reduces the worst-case impacts of poison samples and thus improves certified robustness bounds. In addition, we offer an alternative view of our method, bridging the designs of deterministic and stochastic aggregation-based certified defenses. Empirically, our proposed Finite Aggregation consistently improves certificates on MNIST, CIFAR-10, and GTSRB, boosting certified fractions by up to 3.05%, 3.87% and 4.77%, respectively, while keeping the same clean accuracies as DPA's, effectively establishing a new state of the art in (pointwise) certified robustness against data poisoning.

Poisoning Robustness Evaluation Dataset evaluation

arxiv

Cited by 1

Lethal Dose Conjecture on Data Poisoning

Wenxiao Wang, Alexander Levine, Soheil Feizi

Published: 8.6.2022

Data poisoning considers an adversary that distorts the training set of machine learning algorithms for malicious purposes. In this work, we bring to light one conjecture regarding the fundamentals of data poisoning, which we call the Lethal Dose Conjecture. The conjecture states: If $n$ clean training samples are needed for accurate predictions, then in a size-$N$ training set, only $\Theta(N/n)$ poisoned samples can be tolerated while ensuring accuracy. Theoretically, we verify this conjecture in multiple cases. We also offer a more general perspective of this conjecture through distribution discrimination. Deep Partition Aggregation (DPA) and its extension, Finite Aggregation (FA) are recent approaches for provable defenses against data poisoning, where they predict through the majority vote of many base models trained from different subsets of training set using a given learner. The conjecture implies that both DPA and FA are (asymptotically) optimal -- if we have the most data-efficient learner, they can turn it into one of the most robust defenses against data poisoning. This outlines a practical approach to developing stronger defenses against poisoning via finding data-efficient learners. Empirically, as a proof of concept, we show that by simply using different data augmentations for base learners, we can respectively double and triple the certified robustness of DPA on CIFAR-10 and GTSRB without sacrificing accuracy.

Machine Learning Method Robustness Evaluation Classification Pattern Analysis

ACM Computing Surveys

Threats to training: A survey of poisoning attacks and defenses on machine learning systems

Zhibo Wang, Jingjing Ma, Xue Wang, Jiahui Hu, Zhan Qin, Kui Ren

Published: 2022

Yale University Press

Manufacturing consensus: Understanding propaganda in the era of automation and anonymity

Samuel Woolley

Published: 2023

arxiv

Cited by 1

RAB: Provable Robustness Against Backdoor Attacks

Maurice Weber, Xiaojun Xu, Bojan Karlaš, Ce Zhang, Bo Li

Published: 3.20.2020

Recent studies have shown that deep neural networks (DNNs) are vulnerable to adversarial attacks, including evasion and backdoor (poisoning) attacks. On the defense side, there have been intensive efforts on improving both empirical and provable robustness against evasion attacks; however, the provable robustness against backdoor attacks still remains largely unexplored. In this paper, we focus on certifying the machine learning model robustness against general threat models, especially backdoor attacks. We first provide a unified framework via randomized smoothing techniques and show how it can be instantiated to certify the robustness against both evasion and backdoor attacks. We then propose the first robust training process, RAB, to smooth the trained model and certify its robustness against backdoor attacks. We prove the robustness bound for machine learning models trained with RAB and prove that our robustness bound is tight. In addition, we theoretically show that it is possible to train the robust smoothed models efficiently for simple models such as K-nearest neighbor classifiers, and we propose an exact smooth-training algorithm that eliminates the need to sample from a noise distribution for such models. Empirically, we conduct comprehensive experiments for different machine learning (ML) models such as DNNs, support vector machines, and K-NN models on MNIST, CIFAR-10, and ImageNette datasets and provide the first benchmark for certified robustness against backdoor attacks. In addition, we evaluate K-NN models on a spambase tabular dataset to demonstrate the advantages of the proposed exact algorithm. Both the theoretic analysis and the comprehensive evaluation on diverse ML models and datasets shed light on further robust learning strategies against general training time attacks.

Robustness Adversarial Example Backdoor Attack

IEEE Access

Poisoning attacks in federated learning: A survey

Geming Xia, Jian Chen, Chaodong Yu, Jun Ma

Published: 2023

Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms

H. Xiao, K. Rasul, R. Vollgraf

Published: 2017

IEEE Trans. Signal Inf. Process. over Networks

Byrdie: Byzantine-resilient distributed coordinate descent for decentralized learning

Zhixiong Yang, Waheed U. Bajwa

Published: 2019

International conference on machine learning

Byzantine-robust distributed learning: Towards optimal statistical rates

Yin, D., Chen, Y., Kannan, R., Bartlett, P.

Published: 2018

CNN

Facebook removed 2.2 billion fake accounts in three months

Kaya Yurieff

Published: 2019

5th International Conference on Learning Representations

Understanding deep learning requires rethinking generalization

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals

Published: 2017

Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Fldetector: Defending federated learning against model poisoning attacks via detecting malicious clients

Z. Zhang, X. Cao, J. Jia, N. Z. Gong

Published: 2022