Machine learning models are now widely deployed in real-world applications.
However, the existence of adversarial examples has been long considered a real
threat to such models. While numerous defenses aiming to improve the robustness
have been proposed, many have been shown ineffective. As these vulnerabilities
are still nowhere near being eliminated, we propose an alternative
deployment-based defense paradigm that goes beyond the traditional white-box
and black-box threat models. Instead of training a single partially-robust
model, one could train a set of same-functionality, yet, adversarially-disjoint
models with minimal in-between attack transferability. These models could then
be randomly and individually deployed, such that accessing one of them
minimally affects the others. Our experiments on CIFAR-10 and a wide range of
attacks show that we achieve a significantly lower attack transferability
across our disjoint models compared to a baseline of ensemble diversity. In
addition, compared to an adversarially trained set, we achieve a higher average
robust accuracy while maintaining the accuracy of clean examples.