CNNs achieve remarkable performance by leveraging deep, over-parametrized
architectures, trained on large datasets. However, they have limited
generalization ability to data outside the training domain, and a lack of
robustness to noise and adversarial attacks. By building better inductive
biases, we can improve robustness and also obtain smaller networks that are
more memory and computationally efficient. While standard CNNs use matrix
computations, we study tensor layers that involve higher-order computations and
provide better inductive bias. Specifically, we impose low-rank tensor
structures on the weights of tensor regression layers to obtain compact
networks, and propose tensor dropout, a randomization in the tensor rank for
robustness. We show that our approach outperforms other methods for large-scale
image classification on ImageNet and CIFAR-100. We establish a new
state-of-the-art accuracy for phenotypic trait prediction on the largest
dataset of brain MRI, the UK Biobank brain MRI dataset, where multi-linear
structure is paramount. In all cases, we demonstrate superior performance and
significantly improved robustness, both to noisy inputs and to adversarial
attacks. We rigorously validate the theoretical validity of our approach by
establishing the link between our randomized decomposition and non-linear
dropout.