These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The widespread adoption of Retrieval-Augmented Generation (RAG) systems in
real-world applications has heightened concerns about the confidentiality and
integrity of their proprietary knowledge bases. These knowledge bases, which
play a critical role in enhancing the generative capabilities of Large Language
Models (LLMs), are increasingly vulnerable to breaches that could compromise
sensitive information. To address these challenges, this paper proposes an
advanced encryption methodology designed to protect RAG systems from
unauthorized access and data leakage. Our approach encrypts both textual
content and its corresponding embeddings prior to storage, ensuring that all
data remains securely encrypted. This mechanism restricts access to authorized
entities with the appropriate decryption keys, thereby significantly reducing
the risk of unintended data exposure. Furthermore, we demonstrate that our
encryption strategy preserves the performance and functionality of RAG
pipelines, ensuring compatibility across diverse domains and applications. To
validate the robustness of our method, we provide comprehensive security proofs
that highlight its resilience against potential threats and vulnerabilities.
These proofs also reveal limitations in existing approaches, which often lack
robustness, adaptability, or reliance on open-source models. Our findings
suggest that integrating advanced encryption techniques into the design and
deployment of RAG systems can effectively enhance privacy safeguards. This
research contributes to the ongoing discourse on improving security measures
for AI-driven services and advocates for stricter data protection standards
within RAG architectures.