These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Differential privacy (DP) is becoming increasingly important for deployed
machine learning applications because it provides strong guarantees for
protecting the privacy of individuals whose data is used to train models.
However, DP mechanisms commonly used in machine learning tend to struggle on
many real world distributions, including highly imbalanced or small labeled
training sets. In this work, we propose a new scalable DP mechanism for deep
learning models, SWAG-PPM, by using a pseudo posterior distribution that
downweights by-record likelihood contributions proportionally to their
disclosure risks as the randomized mechanism. As a motivating example from
official statistics, we demonstrate SWAG-PPM on a workplace injury text
classification task using a highly imbalanced public dataset published by the
U.S. Occupational Safety and Health Administration (OSHA). We find that
SWAG-PPM exhibits only modest utility degradation against a non-private
comparator while greatly outperforming the industry standard DP-SGD for a
similar privacy budget.
External Datasets
U.S. Occupational Safety and Health Administration (OSHA) Severe Injury Reports