Concerns regarding Large Language Models (LLMs) to memorize and disclose
private information, particularly Personally Identifiable Information (PII),
become prominent within the community. Many efforts have been made to mitigate
the privacy risks. However, the mechanism through which LLMs memorize PII
remains poorly understood. To bridge this gap, we introduce a pioneering method
for pinpointing PII-sensitive neurons (privacy neurons) within LLMs. Our method
employs learnable binary weight masks to localize specific neurons that account
for the memorization of PII in LLMs through adversarial training. Our
investigations discover that PII is memorized by a small subset of neurons
across all layers, which shows the property of PII specificity. Furthermore, we
propose to validate the potential in PII risk mitigation by deactivating the
localized privacy neurons. Both quantitative and qualitative experiments
demonstrate the effectiveness of our neuron localization algorithm.