Paper Information
- Author
- Joann Qiongna Chen;Xinlei He;Zheng Li;Yang Zhang;Zhou Li
- Published
- 10-16-2023
- Affiliation
- University of California, Irvine
- Country
- United States of America
- Conference
- Proc. Priv. Enhancing Technol.
Abstract
Training a machine learning model with data following a meaningful order,
i.e., from easy to hard, has been proven to be effective in accelerating the
training process and achieving better model performance. The key enabling
technique is curriculum learning (CL), which has seen great success and has
been deployed in areas like image and text classification. Yet, how CL affects
the privacy of machine learning is unclear. Given that CL changes the way a
model memorizes the training data, its influence on data privacy needs to be
thoroughly evaluated. To fill this knowledge gap, we perform the first study
and leverage membership inference attack (MIA) and attribute inference attack
(AIA) as two vectors to quantify the privacy leakage caused by CL.
Our evaluation of nine real-world datasets with attack methods (NN-based,
metric-based, label-only MIA, and NN-based AIA) revealed new insights about CL.
First, MIA becomes slightly more effective when CL is applied, but the impact
is much more prominent to a subset of training samples ranked as difficult.
Second, a model trained under CL is less vulnerable under AIA, compared to MIA.
Third, the existing defense techniques like DP-SGD, MemGuard, and MixupMMD are
still effective under CL, though DP-SGD has a significant impact on target
model accuracy. Finally, based on our insights into CL, we propose a new MIA,
termed Diff-Cali, which exploits the difficulty scores for result calibration
and is demonstrated to be effective against all CL methods and the normal
training method. With this study, we hope to draw the community's attention to
the unintended privacy risks of emerging machine-learning techniques and
develop new attack benchmarks and defense solutions.