These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Machine learning (ML) models can leak information about users, and
differential privacy (DP) provides a rigorous way to bound that leakage under a
given budget. This DP budget can be regarded as a new type of compute resource
in workloads of multiple ML models training on user data. Once it is used, the
DP budget is forever consumed. Therefore, it is crucial to allocate it most
efficiently to train as many models as possible. This paper presents the
scheduler for privacy that optimizes for efficiency. We formulate privacy
scheduling as a new type of multidimensional knapsack problem, called privacy
knapsack, which maximizes DP budget efficiency. We show that privacy knapsack
is NP-hard, hence practical algorithms are necessarily approximate. We develop
an approximation algorithm for privacy knapsack, DPack, and evaluate it on
microbenchmarks and on a new, synthetic private-ML workload we developed from
the Alibaba ML cluster trace. We show that DPack: (1) often approaches the
efficiency-optimal schedule, (2) consistently schedules more tasks compared to
a state-of-the-art privacy scheduling algorithm that focused on fairness
(1.3-1.7x in Alibaba, 1.0-2.6x in microbenchmarks), but (3) sacrifices some
level of fairness for efficiency. Therefore, using DPack, DP ML operators
should be able to train more models on the same amount of user data while
offering the same privacy guarantee to their users.