Private Synthetic Data Meets Ensemble Learning

TOP 文献データベース Private Synthetic Data Meets Ensemble Learning

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2310.09729

PDF

https://arxiv.org/pdf/2310.09729

文献情報

作者: Haoyuan Sun;Navid Azizan;Akash Srivastava;Hao Wang
公開日: 2023-10-15
所属機関: Massachusetts Institute of Technology
所属の国: United States of America
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

データ生成プライバシー保護手法評価指標

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

When machine learning models are trained on synthetic data and then deployed on real data, there is often a performance drop due to the distribution shift between synthetic and real data. In this paper, we introduce a new ensemble strategy for training downstream models, with the goal of enhancing their performance when used on real data. We generate multiple synthetic datasets by applying a differential privacy (DP) mechanism several times in parallel and then ensemble the downstream models trained on these datasets. While each synthetic dataset might deviate more from the real data distribution, they collectively increase sample diversity. This may enhance the robustness of downstream models against distribution shifts. Our extensive experiments reveal that while ensembling does not enhance downstream performance (compared with training a single model) for models trained on synthetic data generated by marginal-based or workload-based DP mechanisms, our proposed ensemble strategy does improve the performance for models trained using GAN-based DP mechanisms in terms of both accuracy and calibration of downstream models.

外部データセット

Adult

Bank Marketing

Online Shoppers