Text Embeddings Reveal (Almost) As Much As Text

TOP 文献データベース Text Embeddings Reveal (Almost) As Much As Text

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2310.06816

PDF

https://arxiv.org/pdf/2310.06816

文献情報

作者: John X. Morris,Volodymyr Kuleshov,Vitaly Shmatikov,Alexander M. Rush
公開日: 2023-10-11
所属機関: Department of Computer Science, Cornell University
所属の国: United States of America
会議名: Conference on Empirical Methods in Natural Language Processing (EMNLP)

AIにより推定されたラベル

モデルインバージョンモデル評価メンバーシップ推論

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

How much private information do text embeddings reveal about the original text? We investigate the problem of embedding \textit{inversion}, reconstructing the full text represented in dense text embeddings. We frame the problem as controlled generation: generating text that, when reembedded, is close to a fixed point in latent space. We find that although a na\"ive model conditioned on the embedding performs poorly, a multi-step method that iteratively corrects and re-embeds text is able to recover $92\%$ of $32\text{-token}$ text inputs exactly. We train our model to decode text embeddings from two state-of-the-art embedding models, and also show that our model can recover important personal information (full names) from a dataset of clinical notes. Our code is available on Github: \href{https://github.com/jxmorris12/vec2text}{github.com/jxmorris12/vec2text}.

外部データセット

Natural Questions

MSMARCO

MIMIC-III

BEIR benchmark