SoK: Data Minimization in Machine Learning

TOP 文献データベース SoK: Data Minimization in Machine Learning

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2508.10836

PDF

https://arxiv.org/pdf/2508.10836

文献情報

作者: Robin Staab,Nikola Jovanović,Kimberly Mai,Prakhar Ganesh,Martin Vechev,Ferdinando Fioretto,Matthew Jagielski
公開日: 2025-8-15
所属機関: ETH Zurich
所属の国: Switzerland
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

RAG 差分プライバシープライバシー評価

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Data minimization (DM) describes the principle of collecting only the data strictly necessary for a given task. It is a foundational principle across major data protection regulations like GDPR and CPRA. Violations of this principle have substantial real-world consequences, with regulatory actions resulting in fines reaching hundreds of millions of dollars. Notably, the relevance of data minimization is particularly pronounced in machine learning (ML) applications, which typically rely on large datasets, resulting in an emerging research area known as Data Minimization in Machine Learning (DMML). At the same time, existing work on other ML privacy and security topics often addresses concerns relevant to DMML without explicitly acknowledging the connection. This disconnect leads to confusion among practitioners, complicating their efforts to implement DM principles and interpret the terminology, metrics, and evaluation criteria used across different research communities. To address this gap, our work introduces a comprehensive framework for DMML, including a unified data pipeline, adversaries, and points of minimization. This framework allows us to systematically review the literature on data minimization and \emph{DM-adjacent} methodologies, for the first time presenting a structured overview designed to help practitioners and researchers effectively apply DM principles. Our work facilitates a unified DM-centric understanding and broader adoption of data minimization strategies in AI/ML.

外部データセット

CIFAR-10

SVHN

MNIST

ImageNet

COCO

Places365

CelebA

UTKFace

Adult

Folktables

LendingClub

MIMIC-III

eICU

AG News

GLUE

Common Crawl

The Pile

MovieLens