Instance-Level Data-Use Auditing of Visual ML Models

TOP Literature Database Instance-Level Data-Use Auditing of Visual ML Models

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2503.22413

PDF

https://arxiv.org/pdf/2503.22413

Paper Information

Author: Zonghao Huang,Neil Zhenqiang Gong,Michael K. Reiter
Published: 3-28-2025
Updated: 9-16-2025
Affiliation: Duke University
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Watermarking Technology Detection of Model Extraction Attacks データ毒性(Fail to translate)

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

The growing trend of legal disputes over the unauthorized use of data in machine learning (ML) systems highlights the urgent need for reliable data-use auditing mechanisms to ensure accountability and transparency in ML. We present the first proactive, instance-level, data-use auditing method designed to enable data owners to audit the use of their individual data instances in ML models, providing more fine-grained auditing results than previous work. To do so, our research generalizes previous work integrating black-box membership inference and sequential hypothesis testing, expanding its scope of application while preserving the quantifiable and tunable false-detection rate that is its hallmark. We evaluate our method on three types of visual ML models: image classifiers, visual encoders, and vision-language models (Contrastive Language-Image Pretraining (CLIP) and Bootstrapping Language-Image Pretraining (BLIP) models). In addition, we apply our method to evaluate the performance of two state-of-the-art approximate unlearning methods. As a noteworthy second contribution, our work reveals that neither method successfully removes the influence of the unlearned data instances from image classifiers and CLIP models, even if sacrificing model utility by $10\%$.

External Datasets

CIFAR-100

TinyImageNet

ImageNet

Flickr30k