Mixed Sample Data Augmentation (MSDA) has received increasing attention in
recent years, with many successful variants such as MixUp and CutMix. By
studying the mutual information between the function learned by a VAE on the
original data and on the augmented data we show that MixUp distorts learned
functions in a way that CutMix does not. We further demonstrate this by showing
that MixUp acts as a form of adversarial training, increasing robustness to
attacks such as Deep Fool and Uniform Noise which produce examples similar to
those generated by MixUp. We argue that this distortion prevents models from
learning about sample specific features in the data, aiding generalisation
performance. In contrast, we suggest that CutMix works more like a traditional
augmentation, improving performance by preventing memorisation without
distorting the data distribution. However, we argue that an MSDA which builds
on CutMix to include masks of arbitrary shape, rather than just square, could
further prevent memorisation whilst preserving the data distribution in the
same way. To this end, we propose FMix, an MSDA that uses random binary masks
obtained by applying a threshold to low frequency images sampled from Fourier
space. These random masks can take on a wide range of shapes and can be
generated for use with one, two, and three dimensional data. FMix improves
performance over MixUp and CutMix, without an increase in training time, for a
number of models across a range of data sets and problem settings, obtaining a
new single model state-of-the-art result on CIFAR-10 without external data.
Finally, we show that a consequence of the difference between interpolating
MSDA such as MixUp and masking MSDA such as FMix is that the two can be
combined to improve performance even further. Code for all experiments is
provided at https://github.com/ecs-vlc/FMix .