Federated learning (FL) is a distributed machine learning paradigm that
allows clients to collaboratively train a model over their own local data. FL
promises the privacy of clients and its security can be strengthened by
cryptographic methods such as additively homomorphic encryption (HE). However,
the efficiency of FL could seriously suffer from the statistical heterogeneity
in both the data distribution discrepancy among clients and the global
distribution skewness. We mathematically demonstrate the cause of performance
degradation in FL and examine the performance of FL over various datasets. To
tackle the statistical heterogeneity problem, we propose a pluggable
system-level client selection method named Dubhe, which allows clients to
proactively participate in training, meanwhile preserving their privacy with
the assistance of HE. Experimental results show that Dubhe is comparable with
the optimal greedy method on the classification accuracy, with negligible
encryption and communication overhead.