LUMIA: Linear probing for Unimodal and MultiModal Membership Inference Attacks leveraging internal LLM states

Authors: Luis Ibanez-Lissen, Lorena Gonzalez-Manzano, Jose Maria de Fuentes, Nicolas Anciaux, Joaquin Garcia-Alfaro | Published: 2024-11-29 | Updated: 2025-01-10

2024.11.292025.04.03

Authors: Luis Ibanez-Lissen, Lorena Gonzalez-Manzano, Jose Maria de Fuentes, Nicolas Anciaux, Joaquin Garcia-Alfaro
Published: 2024-11-29 | Updated: 2025-01-10

Source: https://arxiv.org/abs/2411.19876

PDF: https://arxiv.org/pdf/2411.19876

AIにより推定されたラベル

メンバーシップ推論 LLM性能評価

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Large Language Models (LLMs) are increasingly used in a variety of applications, but concerns around membership inference have grown in parallel. Previous efforts focus on black-to-grey-box models, thus neglecting the potential benefit from internal LLM information. To address this, we propose the use of Linear Probes (LPs) as a method to detect Membership Inference Attacks (MIAs) by examining internal activations of LLMs. Our approach, dubbed LUMIA, applies LPs layer-by-layer to get fine-grained data on the model inner workings. We test this method across several model architectures, sizes and datasets, including unimodal and multimodal tasks. In unimodal MIA, LUMIA achieves an average gain of 15.71 techniques. Remarkably, LUMIA reaches AUC>60 increment of 46.80 reveals key insights, such as the model layers where MIAs are most detectable. In multimodal models, LPs indicate that visual inputs can significantly contribute to detect MIAs – AUC>60