Large language models (LLMs) offer personalized responses based on user
interactions, but this use case raises serious privacy concerns. Homomorphic
encryption (HE) is a cryptographic protocol supporting arithmetic computations
in encrypted states and provides a potential solution for privacy-preserving
machine learning (PPML). However, the computational intensity of transformers
poses challenges for applying HE to LLMs. In this work, we propose a modified
HE-friendly transformer architecture with an emphasis on inference following
personalized (private) fine-tuning. Utilizing LoRA fine-tuning and Gaussian
kernels, we achieve significant computational speedups -- 6.94x for fine-tuning
and 2.3x for inference -- while maintaining performance comparable to plaintext
models. Our findings provide a viable proof of concept for offering
privacy-preserving LLM services in areas where data protection is crucial.