These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
We study probabilistic safety for Bayesian Neural Networks (BNNs) under
adversarial input perturbations. Given a compact set of input points, $T
\subseteq \mathbb{R}^m$, we study the probability w.r.t. the BNN posterior that
all the points in $T$ are mapped to the same region $S$ in the output space. In
particular, this can be used to evaluate the probability that a network sampled
from the BNN is vulnerable to adversarial attacks. We rely on relaxation
techniques from non-convex optimization to develop a method for computing a
lower bound on probabilistic safety for BNNs, deriving explicit procedures for
the case of interval and linear function propagation techniques. We apply our
methods to BNNs trained on a regression task, airborne collision avoidance, and
MNIST, empirically showing that our approach allows one to certify
probabilistic safety of BNNs with millions of parameters.