These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Modern machine learning models are sensitive to the manipulation of both the
training data (poisoning attacks) and inference data (adversarial examples).
Recognizing this issue, the community has developed many empirical defenses
against both attacks and, more recently, certification methods with provable
guarantees against inference-time attacks. However, such guarantees are still
largely lacking for training-time attacks. In this work, we present FullCert,
the first end-to-end certifier with sound, deterministic bounds, which proves
robustness against both training-time and inference-time attacks. We first
bound all possible perturbations an adversary can make to the training data
under the considered threat model. Using these constraints, we bound the
perturbations' influence on the model's parameters. Finally, we bound the
impact of these parameter changes on the model's prediction, resulting in joint
robustness guarantees against poisoning and adversarial examples. To facilitate
this novel certification paradigm, we combine our theoretical work with a new
open-source library BoundFlow, which enables model training on bounded
datasets. We experimentally demonstrate FullCert's feasibility on two datasets.