Recent advances in autoencoders and generative models have given rise to
effective video forgery methods, used for generating so-called "deepfakes".
Mitigation research is mostly focused on post-factum deepfake detection and not
on prevention. We complement these efforts by introducing a novel class of
adversarial attacks---training-resistant attacks---which can disrupt
face-swapping autoencoders whether or not its adversarial images have been
included in the training set of said autoencoders. We propose the Oscillating
GAN (OGAN) attack, a novel attack optimized to be training-resistant, which
introduces spatial-temporal distortions to the output of face-swapping
autoencoders. To implement OGAN, we construct a bilevel optimization problem,
where we train a generator and a face-swapping model instance against each
other. Specifically, we pair each input image with a target distortion, and
feed them into a generator that produces an adversarial image. This image will
exhibit the distortion when a face-swapping autoencoder is applied to it. We
solve the optimization problem by training the generator and the face-swapping
model simultaneously using an iterative process of alternating optimization.
Next, we analyze the previously published Distorting Attack and show it is
training-resistant, though it is outperformed by our suggested OGAN. Finally,
we validate both attacks using a popular implementation of FaceSwap, and show
that they transfer across different target models and target faces, including
faces the adversarial attacks were not trained on. More broadly, these results
demonstrate the existence of training-resistant adversarial attacks,
potentially applicable to a wide range of domains.