Counterfactual Influence as a Distributional Quantity

TOP 文献データベース Counterfactual Influence as a Distributional Quantity

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2506.20481

PDF

https://arxiv.org/pdf/2506.20481

文献情報

作者: Matthieu Meeus,Igor Shilov,Georgios Kaissis,Yves-Alexandre de Montjoye
公開日: 2025-6-25
所属機関: Imperial College London
所属の国: United Kingdom
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

評価メトリクスプライバシー保護性能評価指標

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Machine learning models are known to memorize samples from their training data, raising concerns around privacy and generalization. Counterfactual self-influence is a popular metric to study memorization, quantifying how the model's prediction for a sample changes depending on the sample's inclusion in the training dataset. However, recent work has shown memorization to be affected by factors beyond self-influence, with other training samples, in particular (near-)duplicates, having a large impact. We here study memorization treating counterfactual influence as a distributional quantity, taking into account how all training samples influence how a sample is memorized. For a small language model, we compute the full influence distribution of training samples on each other and analyze its properties. We find that solely looking at self-influence can severely underestimate tangible risks associated with memorization: the presence of (near-)duplicates seriously reduces self-influence, while we find these samples to be (near-)extractable. We observe similar patterns for image classification, where simply looking at the influence distributions reveals the presence of near-duplicates in CIFAR-10. Our findings highlight that memorization stems from complex interactions across training data and is better captured by the full influence distribution than by self-influence alone.

外部データセット

Natural Questions

CIFAR-10