Federated learning (FL) is a privacy-preserving learning paradigm that allows
multiple parities to jointly train a powerful machine learning model without
sharing their private data. According to the form of collaboration, FL can be
further divided into horizontal federated learning (HFL) and vertical federated
learning (VFL). In HFL, participants share the same feature space and
collaborate on data samples, while in VFL, participants share the same sample
IDs and collaborate on features. VFL has a broader scope of applications and is
arguably more suitable for joint model training between large enterprises.
In this paper, we focus on VFL and investigate potential privacy leakage in
real-world VFL frameworks. We design and implement two practical privacy
attacks: reverse multiplication attack for the logistic regression VFL
protocol; and reverse sum attack for the XGBoost VFL protocol. We empirically
show that the two attacks are (1) effective - the adversary can successfully
steal the private training data, even when the intermediate outputs are
encrypted to protect data privacy; (2) evasive - the attacks do not deviate
from the protocol specification nor deteriorate the accuracy of the target
model; and (3) easy - the adversary needs little prior knowledge about the data
distribution of the target participant. We also show the leaked information is
as effective as the raw training data in training an alternative classifier. We
further discuss potential countermeasures and their challenges, which we hope
can lead to several promising research directions.