These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Social chatbots, also known as chit-chat chatbots, evolve rapidly with large
pretrained language models. Despite the huge progress, privacy concerns have
arisen recently: training data of large language models can be extracted via
model inversion attacks. On the other hand, the datasets used for training
chatbots contain many private conversations between two individuals. In this
work, we further investigate the privacy leakage of the hidden states of
chatbots trained by language modeling which has not been well studied yet. We
show that speakers' personas can be inferred through a simple neural network
with high accuracy. To this end, we propose effective defense objectives to
protect persona leakage from hidden states. We conduct extensive experiments to
demonstrate that our proposed defense objectives can greatly reduce the attack
accuracy from 37.6% to 0.5%. Meanwhile, the proposed objectives preserve
language models' powerful generation ability.