These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Language models trained via federated learning (FL) demonstrate impressive
capabilities in handling complex tasks while protecting user privacy. Recent
studies indicate that leveraging gradient information and prior knowledge can
potentially reveal training samples within FL setting. However, these
investigations have overlooked the potential privacy risks tied to the
intrinsic architecture of the models. This paper presents a two-stage privacy
attack strategy that targets the vulnerabilities in the architecture of
contemporary language models, significantly enhancing attack performance by
initially recovering certain feature directions as additional supervisory
signals. Our comparative experiments demonstrate superior attack performance
across various datasets and scenarios, highlighting the privacy leakage risk
associated with the increasingly complex architectures of language models. We
call for the community to recognize and address these potential privacy risks
in designing large language models.