Machine-learning models for security-critical applications such as bot,
malware, or spam detection, operate in constrained discrete domains. These
applications would benefit from having provable guarantees against adversarial
examples. The existing literature on provable adversarial robustness of models,
however, exclusively focuses on robustness to gradient-based attacks in domains
such as images. These attacks model the adversarial cost, e.g., amount of
distortion applied to an image, as a $p$-norm. We argue that this approach is
not well-suited to model adversarial costs in constrained domains where not all
examples are feasible.
We introduce a graphical framework that (1) generalizes existing attacks in
discrete domains, (2) can accommodate complex cost functions beyond $p$-norms,
including financial cost incurred when attacking a classifier, and (3)
efficiently produces valid adversarial examples with guarantees of minimal
adversarial cost. These guarantees directly translate into a notion of
adversarial robustness that takes into account domain constraints and the
adversary's capabilities. We show how our framework can be used to evaluate
security by crafting adversarial examples that evade a Twitter-bot detection
classifier with provably minimal number of changes; and to build privacy
defenses by crafting adversarial examples that evade a privacy-invasive
website-fingerprinting classifier.