Large language models (LLMs) are increasingly used as personal agents,
accessing sensitive user data such as calendars, emails, and medical records.
Users currently face a trade-off: They can send private records, many of which
are stored in remote databases, to powerful but untrusted LLM providers,
increasing their exposure risk. Alternatively, they can run less powerful
models locally on trusted devices. We bridge this gap. Our Socratic
Chain-of-Thought Reasoning first sends a generic, non-private user query to a
powerful, untrusted LLM, which generates a Chain-of-Thought (CoT) prompt and
detailed sub-queries without accessing user data. Next, we embed these
sub-queries and perform encrypted sub-second semantic search using our
Homomorphically Encrypted Vector Database across one million entries of a
single user's private data. This represents a realistic scale of personal
documents, emails, and records accumulated over years of digital activity.
Finally, we feed the CoT prompt and the decrypted records to a local language
model and generate the final response. On the LoCoMo long-context QA benchmark,
our hybrid framework, combining GPT-4o with a local Llama-3.2-1B model,
outperforms using GPT-4o alone by up to 7.1 percentage points. This
demonstrates a first step toward systems where tasks are decomposed and split
between untrusted strong LLMs and weak local ones, preserving user privacy.