As we rely on machine learning (ML) models to make more consequential
decisions, the issue of ML models perpetuating or even exacerbating undesirable
historical biases (e.g., gender and racial biases) has come to the fore of the
public's attention. In this paper, we focus on the problem of detecting
violations of individual fairness in ML models. We formalize the problem as
measuring the susceptibility of ML models against a form of adversarial attack
and develop a suite of inference tools for the adversarial cost function. The
tools allow auditors to assess the individual fairness of ML models in a
statistically-principled way: form confidence intervals for the worst-case
performance differential between similar individuals and test hypotheses of
model fairness with (asymptotic) non-coverage/Type I error rate control. We
demonstrate the utility of our tools in a real-world case study.