As Large Language Models (LLMs) are increasingly deployed in sensitive
domains, traditional data privacy measures prove inadequate for protecting
information that is implicit, contextual, or inferable - what we define as
semantic privacy. This Systematization of Knowledge (SoK) introduces a
lifecycle-centric framework to analyze how semantic privacy risks emerge across
input processing, pretraining, fine-tuning, and alignment stages of LLMs. We
categorize key attack vectors and assess how current defenses, such as
differential privacy, embedding encryption, edge computing, and unlearning,
address these threats. Our analysis reveals critical gaps in semantic-level
protection, especially against contextual inference and latent representation
leakage. We conclude by outlining open challenges, including quantifying
semantic leakage, protecting multimodal inputs, balancing de-identification
with generation quality, and ensuring transparency in privacy enforcement. This
work aims to inform future research on designing robust, semantically aware
privacy-preserving techniques for LLMs.