Large language models are shown to memorize privacy information such as
social security numbers in training data. Given the sheer scale of the training
corpus, it is challenging to screen and filter these privacy data, either
manually or automatically. In this paper, we propose Confidentially Redacted
Training (CRT), a method to train language generation models while protecting
the confidential segments. We borrow ideas from differential privacy (which
solves a related but distinct problem) and show that our method is able to
provably prevent unintended memorization by randomizing parts of the training
process. Moreover, we show that redaction with an approximately correct
screening policy amplifies the confidentiality guarantee. We implement the
method for both LSTM and GPT language models. Our experimental results show
that the models trained by CRT obtain almost the same perplexity while
preserving strong confidentiality.