These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The growing trend of legal disputes over the unauthorized use of data in
machine learning (ML) systems highlights the urgent need for reliable data-use
auditing mechanisms to ensure accountability and transparency in ML. We present
the first proactive, instance-level, data-use auditing method designed to
enable data owners to audit the use of their individual data instances in ML
models, providing more fine-grained auditing results than previous work. To do
so, our research generalizes previous work integrating black-box membership
inference and sequential hypothesis testing, expanding its scope of application
while preserving the quantifiable and tunable false-detection rate that is its
hallmark. We evaluate our method on three types of visual ML models: image
classifiers, visual encoders, and vision-language models (Contrastive
Language-Image Pretraining (CLIP) and Bootstrapping Language-Image Pretraining
(BLIP) models). In addition, we apply our method to evaluate the performance of
two state-of-the-art approximate unlearning methods. As a noteworthy second
contribution, our work reveals that neither method successfully removes the
influence of the unlearned data instances from image classifiers and CLIP
models, even if sacrificing model utility by $10\%$.