AIセキュリティポータル K Program
LLM-PBE: Assessing Data Privacy in Large Language Models
Share
Abstract
Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis. Their profound capabilities in processing and interpreting complex language data, however, bring to light pressing concerns regarding data privacy, especially the risk of unintentional training data leakage. Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs. Addressing this gap, our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs. LLM-PBE is designed to analyze privacy across the entire lifecycle of LLMs, incorporating diverse attack and defense strategies, and handling various data types and metrics. Through detailed experimentation with multiple LLMs, LLM-PBE facilitates an in-depth exploration of data privacy concerns, shedding light on influential factors such as model size, data characteristics, and evolving temporal dimensions. This study not only enriches the understanding of privacy issues in LLMs but also serves as a vital resource for future research in the field. Aimed at enhancing the breadth of knowledge in this area, the findings, resources, and our full technical report are made available at https://llm-pbe.github.io/, providing an open platform for academic and practical advancements in LLM privacy assessment.
Privacy-preserving data mining
Rakesh Agrawal, Ramakrishnan Srikant
Published: 2000
FLAIR: An easy-to-use framework for state-of-the-art NLP
Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, Roland Vollgraf
Published: 2019
From large language models to databases and back: A discussion on research and education
Sihem Amer-Yahia, Angela Bonifati, Lei Chen, Guoliang Li, Kyuseok Shim, Jianliang Xu, Xiaochun Yang
Published: 2023
Data privacy through optimal k-anonymization
Roberto J Bayardo, Rakesh Agrawal
Published: 2005
Cam: A large language model-based creative analogy mining framework
Bhavya Bhavya, Jinjun Xiong, Chengxiang Zhai
Published: 2023
Pythia: A suite for analyzing large language models across training and scaling
Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, Oskar van der Wal
Published: 2023
From Large Language Models to Databases and Back A discussion on research and education
Angela Bonifati, Sihem Amer-Yahia, Chen Lei, Li Guoliang, Shim Kyuseok, Xu Jianliang, Yang Xiaochun
Published: 2023
Machine unlearning
Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, Nicolas Papernot
Published: 2021
Share