We describe a procedure for removing dependency on a cohort of training data
from a trained deep network that improves upon and generalizes previous methods
to different readout functions and can be extended to ensure forgetting in the
activations of the network. We introduce a new bound on how much information
can be extracted per query about the forgotten cohort from a black-box network
for which only the input-output behavior is observed. The proposed forgetting
procedure has a deterministic part derived from the differential equations of a
linearized version of the model, and a stochastic part that ensures information
destruction by adding noise tailored to the geometry of the loss landscape. We
exploit the connections between the activation and weight dynamics of a DNN
inspired by Neural Tangent Kernels to compute the information in the
activations.