Protecting User Privacy in Remote Conversational Systems: A Privacy-Preserving framework based on text sanitization

TOP Literature Database Protecting User Privacy in Remote Conversational Systems: A Privacy-Preserving framework based on text sanitization

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2306.08223

PDF

https://arxiv.org/pdf/2306.08223

Paper Information

Author: Zhigang Kan;Linbo Qiao;Hao Yu;Liwen Peng;Yifu Gao;Dongsheng Li
Published: 6-14-2023
Affiliation: National University of Defense Technology
Country: China
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Privacy Technique Data Protection Method Information Extraction

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large Language Models (LLMs) are gaining increasing attention due to their exceptional performance across numerous tasks. As a result, the general public utilize them as an influential tool for boosting their productivity while natural language processing researchers endeavor to employ them in solving existing or new research problems. Unfortunately, individuals can only access such powerful AIs through APIs, which ultimately leads to the transmission of raw data to the models' providers and increases the possibility of privacy data leakage. Current privacy-preserving methods for cloud-deployed language models aim to protect privacy information in the pre-training dataset or during the model training phase. However, they do not meet the specific challenges presented by the remote access approach of new large-scale language models. This paper introduces a novel task, "User Privacy Protection for Dialogue Models," which aims to safeguard sensitive user information from any possible disclosure while conversing with chatbots. We also present an evaluation scheme for this task, which covers evaluation metrics for privacy protection, data availability, and resistance to simulation attacks. Moreover, we propose the first framework for this task, namely privacy protection through text sanitization. Before sending the input to remote large models, it filters out the sensitive information, using several rounds of text sanitization based on privacy types that users define. Upon receiving responses from the larger model, our framework automatically restores privacy to ensure that the conversation goes smoothly, without intervention from the privacy filter. Experiments based on real-world datasets demonstrate the efficacy of our privacy-preserving approach against eavesdropping from potential attackers.

External Datasets

ACE2005