Searchable Symmetric Encryption (SSE) enables efficient search capabilities
over encrypted data, allowing users to maintain privacy while utilizing cloud
storage. However, SSE schemes are vulnerable to leakage attacks that exploit
access patterns, search frequency, and volume information. Existing studies
frequently assume that adversaries possess a substantial fraction of the
encrypted dataset to mount effective inference attacks, implying there is a
database leakage of such documents, thus, an assumption that may not hold in
real-world scenarios. In this work, we investigate the feasibility of enhancing
leakage attacks under a more realistic threat model in which adversaries have
access to minimal leaked data. We propose a novel approach that leverages large
language models (LLMs), specifically GPT-4 variants, to generate synthetic
documents that statistically and semantically resemble the real-world dataset
of Enron emails. Using the email corpus as a case study, we evaluate the
effectiveness of synthetic data generated via random sampling and hierarchical
clustering methods on the performance of the SAP (Search Access Pattern)
keyword inference attack restricted to token volumes only. Our results
demonstrate that, while the choice of LLM has limited effect, increasing
dataset size and employing clustering-based generation significantly improve
attack accuracy, achieving comparable performance to attacks using larger
amounts of real data. We highlight the growing relevance of LLMs in adversarial
contexts.