These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
In this paper, we initiate a cryptographically inspired theoretical study of
detection versus mitigation of adversarial inputs produced by attackers on
Machine Learning algorithms during inference time.
We formally define defense by detection (DbD) and defense by mitigation
(DbM). Our definitions come in the form of a 3-round protocol between two
resource-bounded parties: a trainer/defender and an attacker. The attacker aims
to produce inference-time inputs that fool the training algorithm. We define
correctness, completeness, and soundness properties to capture successful
defense at inference time while not degrading (too much) the performance of the
algorithm on inputs from the training distribution.
We first show that achieving DbD and achieving DbM are equivalent for ML
classification tasks. Surprisingly, this is not the case for ML generative
learning tasks, where there are many possible correct outputs for each input.
We show a separation between DbD and DbM by exhibiting two generative learning
tasks for which it is possible to defend by mitigation but it is provably
impossible to defend by detection. The mitigation phase uses significantly less
computational resources than the initial training algorithm. In the first
learning task we consider sample complexity as the resource and in the second
the time complexity. The first result holds under the assumption that the
Identity-Based Fully Homomorphic Encryption (IB-FHE), publicly-verifiable
zero-knowledge Succinct Non-Interactive Arguments of Knowledge (zk-SNARK), and
Strongly Unforgeable Signatures exist. The second result assumes the existence
of Non-Parallelizing Languages with Average-Case Hardness (NPL) and
Incrementally-Verifiable Computation (IVC) and IB-FHE.