This is Btech thesis report on detection and purification of adverserially
attacked images. A deep learning model is trained on certain training examples
for various tasks such as classification, regression etc. By training, weights
are adjusted such that the model performs the task well not only on training
examples judged by a certain metric but has an excellent ability to generalize
on other unseen examples as well which are typically called the test data.
Despite the huge success of machine learning models on a wide range of tasks,
security has received a lot less attention along the years. Robustness along
various potential cyber attacks also should be a metric for the accuracy of the
machine learning models. These cyber attacks can potentially lead to a variety
of negative impacts in the real world sensitive applications for which machine
learning is used such as medical and transportation systems. Hence, it is a
necessity to secure the system from such attacks. Int this report, I focus on a
class of these cyber attacks called the adversarial attacks in which the
original input sample is modified by small perturbations such that they still
look visually the same to human beings but the machine learning models are
fooled by such inputs. In this report I discuss 2 novel ways to counter the
adversarial attack using AutoEncoders, 1) by detecting the presence of
adversaries and 2) purifying these adversaries to make target classification
models robust against such attacks.