You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas

TOP Literature Database You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2205.10228

PDF

https://arxiv.org/pdf/2205.10228

Paper Information

Author: Haoran Li;Yangqiu Song;Lixin Fan
Published: 4-26-2022
Affiliation: Dept. of CSE, Hong Kong University of Science and Technology
Country: Hong Kong
Conference

Labels Estimated by AI

Privacy Leakage Loss Function Attackers and Malicious Devices

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Social chatbots, also known as chit-chat chatbots, evolve rapidly with large pretrained language models. Despite the huge progress, privacy concerns have arisen recently: training data of large language models can be extracted via model inversion attacks. On the other hand, the datasets used for training chatbots contain many private conversations between two individuals. In this work, we further investigate the privacy leakage of the hidden states of chatbots trained by language modeling which has not been well studied yet. We show that speakers' personas can be inferred through a simple neural network with high accuracy. To this end, we propose effective defense objectives to protect persona leakage from hidden states. We conduct extensive experiments to demonstrate that our proposed defense objectives can greatly reduce the attack accuracy from 37.6% to 0.5%. Meanwhile, the proposed objectives preserve language models' powerful generation ability.

External Datasets

PersonaChat

Dialogue NLI