Privacy-preserving machine learning has drawn increasingly attention
recently, especially with kinds of privacy regulations come into force. Under
such situation, Federated Learning (FL) appears to facilitate
privacy-preserving joint modeling among multiple parties. Although many
federated algorithms have been extensively studied, there is still a lack of
secure and practical gradient tree boosting models (e.g., XGB) in literature.
In this paper, we aim to build large-scale secure XGB under vertically
federated learning setting. We guarantee data privacy from three aspects.
Specifically, (i) we employ secure multi-party computation techniques to avoid
leaking intermediate information during training, (ii) we store the output
model in a distributed manner in order to minimize information release, and
(iii) we provide a novel algorithm for secure XGB predict with the distributed
model. Furthermore, by proposing secure permutation protocols, we can improve
the training efficiency and make the framework scale to large dataset. We
conduct extensive experiments on both public datasets and real-world datasets,
and the results demonstrate that our proposed XGB models provide not only
competitive accuracy but also practical performance.