These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Machine learning systems increasingly face requirements to remove entire
domains of information -- such as toxic language or biases -- rather than
individual user data. This task presents a dilemma: full removal of the
unwanted domain data is computationally expensive, while random partial removal
is statistically inefficient. We find that a domain's statistical influence is
often concentrated in a small subset of its data samples, suggesting a path
between ineffective partial removal and unnecessary complete removal. We
formalize this as distributional unlearning: a framework to select a small
subset that balances forgetting an unwanted distribution while preserving a
desired one. Using Kullback-Leibler divergence constraints, we derive the exact
removal-preservation Pareto frontier for exponential families and prove that
models trained on the edited data achieve corresponding log-loss bounds. We
propose a distance-based selection algorithm and show it is quadratically more
sample-efficient than random removal in the challenging low-divergence regime.
Experiments across synthetic, text, and image datasets (Jigsaw, CIFAR-10, SMS
spam) show our method requires 15-82% less deletion than full removal for
strong unlearning effects, e.g., halving initial forget set accuracy.
Ultimately, by showing a small forget set often suffices, our framework lays
the foundations for more scalable and rigorous subpopulation unlearning.