Machine learning can impact people with legal or ethical consequences when it
is used to automate decisions in areas such as insurance, lending, hiring, and
predictive policing. In many of these scenarios, previous decisions have been
made that are unfairly biased against certain subpopulations, for example those
of a particular race, gender, or sexual orientation. Since this past data may
be biased, machine learning predictors must account for this to avoid
perpetuating or creating discriminatory practices. In this paper, we develop a
framework for modeling fairness using tools from causal inference. Our
definition of counterfactual fairness captures the intuition that a decision is
fair towards an individual if it is the same in (a) the actual world and (b) a
counterfactual world where the individual belonged to a different demographic
group. We demonstrate our framework on a real-world problem of fair prediction
of success in law school.