These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Machine learning on large-scale genomic or transcriptomic data is important
for many novel health applications. For example, precision medicine tailors
medical treatments to patients on the basis of individual biomarkers, cellular
and molecular states, etc. However, the data required is sensitive, voluminous,
heterogeneous, and typically distributed across locations where dedicated
machine learning hardware is not available. Due to privacy and regulatory
reasons, it is also problematic to aggregate all data at a trusted third
party.Federated learning is a promising solution to this dilemma, because it
enables decentralized, collaborative machine learning without exchanging raw
data. In this paper, we perform comparative experiments with the federated
learning frameworks TensorFlow Federated and Flower. Our test case is the
training of disease prognosis and cell type classification models. We train the
models with distributed transcriptomic data, considering both data
heterogeneity and architectural heterogeneity. We measure model quality,
robustness against privacy-enhancing noise, computational performance and
resource overhead. Each of the federated learning frameworks has different
strengths. However, our experiments confirm that both frameworks can readily
build models on transcriptomic data, without transferring personal raw data to
a third party with abundant computational resources.
External Datasets
Acute Myeloid Leukemia data set
Expression profiles generated by single-cell RNA-Seq for cell types of the human brain