BLIA: Detect model memorization in binary classification model through passive Label Inference attack

TOP Literature Database BLIA: Detect model memorization in binary classification model through passive Label Inference attack

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2503.12801

PDF

https://arxiv.org/pdf/2503.12801

Paper Information

Author: Mohammad Wahiduzzaman Khan,Sheng Chen,Ilya Mironov,Leizhen Zhang,Rabib Noor
Published: 3-17-2025
Affiliation: Unknown Institution
Country: Unknown Country
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Attack Method Differential Privacy Data Curation

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Model memorization has implications for both the generalization capacity of machine learning models and the privacy of their training data. This paper investigates label memorization in binary classification models through two novel passive label inference attacks (BLIA). These attacks operate passively, relying solely on the outputs of pre-trained models, such as confidence scores and log-loss values, without interacting with or modifying the training process. By intentionally flipping 50% of the labels in controlled subsets, termed "canaries," we evaluate the extent of label memorization under two conditions: models trained without label differential privacy (Label-DP) and those trained with randomized response-based Label-DP. Despite the application of varying degrees of Label-DP, the proposed attacks consistently achieve success rates exceeding 50%, surpassing the baseline of random guessing and conclusively demonstrating that models memorize training labels, even when these labels are deliberately uncorrelated with the features.

External Datasets

CENSUS

FashionMNIST

IMDB Sentiment Analysis

CIFAR-10

CIFAR-100

Big-Vul