In this appraisal paper, we evaluate the efficacy of SHIELD, a
compression-based defense framework for countering adversarial attacks on image
classification models, which was published at KDD 2018. Here, we consider
alternative threat models not studied in the original work, where we assume
that an adaptive adversary is aware of the ensemble defense approach, the
defensive pre-processing, and the architecture and weights of the models used
in the ensemble. We define scenarios with varying levels of threat and
empirically analyze the proposed defense by varying the degree of information
available to the attacker, spanning from a full white-box attack to the
gray-box threat model described in the original work. To evaluate the
robustness of the defense against an adaptive attacker, we consider the
targeted-attack success rate of the Projected Gradient Descent (PGD) attack,
which is a strong gradient-based adversarial attack proposed in adversarial
machine learning research. We also experiment with training the SHIELD ensemble
from scratch, which is different from re-training using a pre-trained model as
done in the original work. We find that the targeted PGD attack has a success
rate of 64.3% against the original SHIELD ensemble in the full white box
scenario, but this drops to 48.9% if the models used in the ensemble are
trained from scratch instead of being retrained. Our experiments further reveal
that an ensemble whose models are re-trained indeed have higher correlation in
the cosine similarity space, and models that are trained from scratch are less
vulnerable to targeted attacks in the white-box and gray-box scenarios.