Transformer models have revolutionized AI, powering applications like content
generation and sentiment analysis. However, their deployment in Machine
Learning as a Service (MLaaS) raises significant privacy concerns, primarily
due to the centralized processing of sensitive user data. Private Transformer
Inference (PTI) offers a solution by utilizing cryptographic techniques such as
secure multi-party computation and homomorphic encryption, enabling inference
while preserving both user data and model privacy. This paper reviews recent
PTI advancements, highlighting state-of-the-art solutions and challenges. We
also introduce a structured taxonomy and evaluation framework for PTI, focusing
on balancing resource efficiency with privacy and bridging the gap between
high-performance inference and data privacy.