We develop a Distributionally Robust Optimization (DRO) formulation for
Multiclass Logistic Regression (MLR), which could tolerate data contaminated by
outliers. The DRO framework uses a probabilistic ambiguity set defined as a
ball of distributions that are close to the empirical distribution of the
training set in the sense of the Wasserstein metric. We relax the DRO
formulation into a regularized learning problem whose regularizer is a norm of
the coefficient matrix. We establish out-of-sample performance guarantees for
the solutions to our model, offering insights on the role of the regularizer in
controlling the prediction error. We apply the proposed method in rendering
deep Vision Transformer (ViT)-based image classifiers robust to random and
adversarial attacks. Specifically, using the MNIST and CIFAR-10 datasets, we
demonstrate reductions in test error rate by up to 83.5% and loss by up to
91.3% compared with baseline methods, by adopting a novel random training
method.