We study a pitfall in the typical workflow for differentially private machine
learning. The use of differentially private learning algorithms in a "drop-in"
fashion -- without accounting for the impact of differential privacy (DP) noise
when choosing what feature engineering operations to use, what features to
select, or what neural network architecture to use -- yields overly complex and
poorly performing models. In other words, by anticipating the impact of DP
noise, a simpler and more accurate alternative model could have been trained
for the same privacy guarantee. We systematically study this phenomenon through
theory and experiments. On the theory front, we provide an explanatory
framework and prove that the phenomenon arises naturally from the addition of
noise to satisfy differential privacy. On the experimental front, we
demonstrate how the phenomenon manifests in practice using various datasets,
types of models, tasks, and neural network architectures. We also analyze the
factors that contribute to the problem and distill our experimental insights
into concrete takeaways that practitioners can follow when training models with
differential privacy. Finally, we propose privacy-aware algorithms for feature
selection and neural network architecture search. We analyze their differential
privacy properties and evaluate them empirically.