Having similar behavior at training time and test time $-$ what we call a
"What You See Is What You Get" (WYSIWYG) property $-$ is desirable in machine
learning. Models trained with standard stochastic gradient descent (SGD),
however, do not necessarily have this property, as their complex behaviors such
as robustness or subgroup performance can differ drastically between training
and test time. In contrast, we show that Differentially-Private (DP) training
provably ensures the high-level WYSIWYG property, which we quantify using a
notion of distributional generalization. Applying this connection, we introduce
new conceptual tools for designing deep-learning methods by reducing
generalization concerns to optimization ones: to mitigate unwanted behavior at
test time, it is provably sufficient to mitigate this behavior on the training
data. By applying this novel design principle, which bypasses "pathologies" of
SGD, we construct simple algorithms that are competitive with SOTA in several
distributional-robustness applications, significantly improve the privacy vs.
disparate impact trade-off of DP-SGD, and mitigate robust overfitting in
adversarial training. Finally, we also improve on theoretical bounds relating
DP, stability, and distributional generalization.