Vertical federated learning (vFL) has gained much attention and been deployed
to solve machine learning problems with data privacy concerns in recent years.
However, some recent work demonstrated that vFL is vulnerable to privacy
leakage even though only the forward intermediate embedding (rather than raw
features) and backpropagated gradients (rather than raw labels) are
communicated between the involved participants. As the raw labels often contain
highly sensitive information, some recent work has been proposed to prevent the
label leakage from the backpropagated gradients effectively in vFL. However,
these work only identified and defended the threat of label leakage from the
backpropagated gradients. None of these work has paid attention to the problem
of label leakage from the intermediate embedding. In this paper, we propose a
practical label inference method which can steal private labels effectively
from the shared intermediate embedding even though some existing protection
methods such as label differential privacy and gradients perturbation are
applied. The effectiveness of the label attack is inseparable from the
correlation between the intermediate embedding and corresponding private
labels. To mitigate the issue of label leakage from the forward embedding, we
add an additional optimization goal at the label party to limit the label
stealing ability of the adversary by minimizing the distance correlation
between the intermediate embedding and corresponding private labels. We
conducted massive experiments to demonstrate the effectiveness of our proposed
protection methods.