This position paper investigates the integration of Differential Privacy (DP)
in the training of Mixture of Experts (MoE) models within the field of natural
language processing. As Large Language Models (LLMs) scale to billions of
parameters, leveraging expansive datasets, they exhibit enhanced linguistic
capabilities and emergent abilities. However, this growth raises significant
computational and privacy concerns. Our study addresses these issues by
exploring the potential of MoE models, known for their computational
efficiency, and the application of DP, a standard for privacy preservation. We
present the first known attempt to train MoE models under the constraints of
DP, addressing the unique challenges posed by their architecture and the
complexities of DP integration. Our initial experimental studies demonstrate
that MoE models can be effectively trained with DP, achieving performance that
is competitive with their non-private counterparts. This initial study aims to
provide valuable insights and ignite further research in the domain of
privacy-preserving MoE models, softly laying the groundwork for prospective
developments in this evolving field.
外部データセット
SST-2
MNLI
参考文献
Proceedings of the 2016 ACM Conference on Computer and Communications Security
Deep Learning with Differential Privacy
Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., Zhang, L.
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei
Published: 2020
30th USENIX Security Symposium
Extracting Training Data from Large Language Models
Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T., Song, D., Erlingsson, U., Oprea, A., Raffel, C.
Published: 2021
Proceedings of Machine Learning Research
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Du, N., Huang, Y., Dai, A. M., Tong, S., Lepikhin, D., Xu, Y., Krikun, M., Zhou, Y., Yu, A. W., Firat, O., Zoph, B., Fedus, L., Bosma, M. P., Zhou, Z., Wang, T., Wang, Y. E., Webster, K., Pellat, M., Robinson, K., Meier-Hellstern, K. S., Duke, T., Dixon, L., Zhang, K., Le, Q. V., Wu, Y., Chen, Z., Cui, C.