Machine unlearning is the process through which a deployed machine learning
model is made to forget about some of its training data points. While naively
retraining the model from scratch is an option, it is almost always associated
with large computational overheads for deep learning models. Thus, several
approaches to approximately unlearn have been proposed along with corresponding
metrics that formalize what it means for a model to forget about a data point.
In this work, we first taxonomize approaches and metrics of approximate
unlearning. As a result, we identify verification error, i.e., the L2
difference between the weights of an approximately unlearned and a naively
retrained model, as an approximate unlearning metric that should be optimized
for as it subsumes a large class of other metrics. We theoretically analyze the
canonical training algorithm, stochastic gradient descent (SGD), to surface the
variables which are relevant to reducing the verification error of approximate
unlearning for SGD. From this analysis, we first derive an easy-to-compute
proxy for verification error (termed unlearning error). The analysis also
informs the design of a new training objective penalty that limits the overall
change in weights during SGD and as a result facilitates approximate unlearning
with lower verification error. We validate our theoretical work through an
empirical evaluation on learning with CIFAR-10, CIFAR-100, and IMDB sentiment
analysis.