Federated learning of deep learning models for supervised tasks, e.g. image
classification and segmentation, has found many applications: for example in
human-in-the-loop tasks such as film post-production where it enables sharing
of domain expertise of human artists in an efficient and effective fashion. In
many such applications, we need to protect the training data from being leaked
when gradients are shared in the training process due to IP or privacy
concerns. Recent works have demonstrated that it is possible to reconstruct the
training data from gradients for an image-classification model when its
architecture is known. However, there is still an incomplete theoretical
understanding of the efficacy and failure of such attacks. In this paper, we
analyse the source of training-data leakage from gradients. We formulate the
problem of training data reconstruction as solving an optimisation problem
iteratively for each layer. The layer-wise objective function is primarily
defined by weights and gradients from the current layer as well as the output
from the reconstruction of the subsequent layer, but it might also involve a
'pull-back' constraint from the preceding layer. Training data can be
reconstructed when we solve the problem backward from the output of the network
through each layer. Based on this formulation, we are able to attribute the
potential leakage of the training data in a deep network to its architecture.
We also propose a metric to measure the level of security of a deep learning
model against gradient-based attacks on the training data.