These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Obtaining high-quality data for collaborative training of machine learning
models can be a challenging task due to A) regulatory concerns and B) a lack of
data owner incentives to participate. The first issue can be addressed through
the combination of distributed machine learning techniques (e.g. federated
learning) and privacy enhancing technologies (PET), such as the differentially
private (DP) model training. The second challenge can be addressed by rewarding
the participants for giving access to data which is beneficial to the training
model, which is of particular importance in federated settings, where the data
is unevenly distributed. However, DP noise can adversely affect the
underrepresented and the atypical (yet often informative) data samples, making
it difficult to assess their usefulness. In this work, we investigate how to
leverage gradient information to permit the participants of private training
settings to select the data most beneficial for the jointly trained model. We
assess two such methods, namely variance of gradients (VoG) and the privacy
loss-input susceptibility score (PLIS). We show that these techniques can
provide the federated clients with tools for principled data selection even in
stricter privacy settings.