Federated Learning (FL) is a machine learning framework that enables multiple
organizations to train a model without sharing their data with a central
server. However, it experiences significant performance degradation if the data
is non-identically independently distributed (non-IID). This is a problem in
medical settings, where variations in the patient population contribute
significantly to distribution differences across hospitals. Personalized FL
addresses this issue by accounting for site-specific distribution differences.
Clustered FL, a Personalized FL variant, was used to address this problem by
clustering patients into groups across hospitals and training separate models
on each group. However, privacy concerns remained as a challenge as the
clustering process requires exchange of patient-level information. This was
previously solved by forming clusters using aggregated data, which led to
inaccurate groups and performance degradation. In this study, we propose
Privacy-preserving Community-Based Federated machine Learning (PCBFL), a novel
Clustered FL framework that can cluster patients using patient-level data while
protecting privacy. PCBFL uses Secure Multiparty Computation, a cryptographic
technique, to securely calculate patient-level similarity scores across
hospitals. We then evaluate PCBFL by training a federated mortality prediction
model using 20 sites from the eICU dataset. We compare the performance gain
from PCBFL against traditional and existing Clustered FL frameworks. Our
results show that PCBFL successfully forms clinically meaningful cohorts of
low, medium, and high-risk patients. PCBFL outperforms traditional and existing
Clustered FL frameworks with an average AUC improvement of 4.3% and AUPRC
improvement of 7.8%.