LLM-PBE: Assessing Data Privacy in Large Language Models | AIセキュリティポータル

EN

JA

EN

TOP 文献データベース LLM-PBE: Assessing Data Privacy in Large Language Models

arxiv

LLM-PBE: Assessing Data Privacy in Large Language Models

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2408.12787

PDF

https://arxiv.org/pdf/2408.12787

文献情報

作者: Qinbin Li;Junyuan Hong;Chulin Xie;Jeffrey Tan;Rachel Xin;Junyi Hou;Xavier Yin;Zhun Wang;Dan Hendrycks;Zhangyang Wang;Bo Li;Bingsheng He;Dawn Song
公開日: 2024-8-23
更新日: 2024-9-6
所属機関: University of California, Berkeley
所属の国: United States of America
会議名: Proc. VLDB Endow.

AIにより推定されたラベル

プロンプトインジェクションプライバシー保護手法 LLMセキュリティ

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis. Their profound capabilities in processing and interpreting complex language data, however, bring to light pressing concerns regarding data privacy, especially the risk of unintentional training data leakage. Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs. Addressing this gap, our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs. LLM-PBE is designed to analyze privacy across the entire lifecycle of LLMs, incorporating diverse attack and defense strategies, and handling various data types and metrics. Through detailed experimentation with multiple LLMs, LLM-PBE facilitates an in-depth exploration of data privacy concerns, shedding light on influential factors such as model size, data characteristics, and evolving temporal dimensions. This study not only enriches the understanding of privacy issues in LLMs but also serves as a vital resource for future research in the field. Aimed at enhancing the breadth of knowledge in this area, the findings, resources, and our full technical report are made available at https://llm-pbe.github.io/, providing an open platform for academic and practical advancements in LLM privacy assessment.

外部データセット

Enron

ECHR

Github

BlackFriday

SynthPAI

参考文献

Published: 2023

Published: 2023

Published: 2023

Published: 2023

Published: 2024

Published: 2024

CCS: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

Deep Learning with Differential Privacy

Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang

Published: 2016

Proceedings of the 2000 ACM SIGMOD international conference on Management of data

Privacy-preserving data mining

Rakesh Agrawal, Ramakrishnan Srikant

Published: 2000

Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics

FLAIR: An easy-to-use framework for state-of-the-art NLP

Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, Roland Vollgraf

Published: 2019

ACM SIGMOD Record

From large language models to databases and back: A discussion on research and education

Sihem Amer-Yahia, Angela Bonifati, Lei Chen, Guoliang Li, Kyuseok Shim, Jianliang Xu, Xiaochun Yang

Published: 2023

Introducing Claude

Published: 2023

maxbachmann/RapidFuzz: Release 1.8.0

Published: 2021

Explaining neural scaling laws

Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, Utkarsh Sharma

Published: 2021

21st International conference on data engineering (ICDE’05)

Data privacy through optimal k-anonymization

Roberto J Bayardo, Rakesh Agrawal

Published: 2005

Proceedings of the ACM Web Conference 2023

Cam: A large language model-based creative analogy mining framework

Bhavya Bhavya, Jinjun Xiong, Chengxiang Zhai

Published: 2023

Pythia: A suite for analyzing large language models across training and scaling

Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, Oskar van der Wal

Published: 2023

ER-AE: Differentially private text generation for authorship anonymization

Haohan Bo, Steven HH Ding, Benjamin Fung, Farkhund Iqbal

Published: 2019

SIGMOD record

From Large Language Models to Databases and Back A discussion on research and education

Angela Bonifati, Sihem Amer-Yahia, Chen Lei, Li Guoliang, Shim Kyuseok, Xu Jianliang, Yang Xiaochun

Published: 2023

2021 IEEE Symposium on Security and Privacy (SP)

Machine unlearning

Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, Nicolas Papernot

Published: 2021

Zero redundancy distributed learning with differential privacy

Zhiqi Bu, Justin Chiu, Ruixuan Liu, Sheng Zha, George Karypis

Published: 2023