Federated Learning on Transcriptomic Data: Model Quality and Performance Trade-Offs

TOP Literature Database Federated Learning on Transcriptomic Data: Model Quality and Performance Trade-Offs

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2402.14527

PDF

https://arxiv.org/pdf/2402.14527

Paper Information

Author: Anika Hannemann;Jan Ewald;Leo Seeger;Erik Buchmann
Published: 2-22-2024
Affiliation: Dept. of Computer Science, Leipzig University
Country: Germany
Conference: IEEE International Conference on Communication Systems (ICCS)

Labels Estimated by AI

Federated Learning Data Privacy Assessment Data Preprocessing

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Machine learning on large-scale genomic or transcriptomic data is important for many novel health applications. For example, precision medicine tailors medical treatments to patients on the basis of individual biomarkers, cellular and molecular states, etc. However, the data required is sensitive, voluminous, heterogeneous, and typically distributed across locations where dedicated machine learning hardware is not available. Due to privacy and regulatory reasons, it is also problematic to aggregate all data at a trusted third party.Federated learning is a promising solution to this dilemma, because it enables decentralized, collaborative machine learning without exchanging raw data. In this paper, we perform comparative experiments with the federated learning frameworks TensorFlow Federated and Flower. Our test case is the training of disease prognosis and cell type classification models. We train the models with distributed transcriptomic data, considering both data heterogeneity and architectural heterogeneity. We measure model quality, robustness against privacy-enhancing noise, computational performance and resource overhead. Each of the federated learning frameworks has different strengths. However, our experiments confirm that both frameworks can readily build models on transcriptomic data, without transferring personal raw data to a third party with abundant computational resources.

External Datasets

Acute Myeloid Leukemia data set

Expression profiles generated by single-cell RNA-Seq for cell types of the human brain