These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Textual data is often represented as real-numbered embeddings in NLP,
particularly with the popularity of large language models (LLMs) and Embeddings
as a Service (EaaS). However, storing sensitive information as embeddings can
be susceptible to security breaches, as research shows that text can be
reconstructed from embeddings, even without knowledge of the underlying model.
While defence mechanisms have been explored, these are exclusively focused on
English, leaving other languages potentially exposed to attacks. This work
explores LLM security through multilingual embedding inversion. We define the
problem of black-box multilingual and cross-lingual inversion attacks, and
explore their potential implications. Our findings suggest that multilingual
LLMs may be more vulnerable to inversion attacks, in part because English-based
defences may be ineffective. To alleviate this, we propose a simple masking
defense effective for both monolingual and multilingual models. This study is
the first to investigate multilingual inversion attacks, shedding light on the
differences in attacks and defenses across monolingual and multilingual
settings.