Speech Emotion Recognition (SER) application is frequently associated with
privacy concerns as it often acquires and transmits speech data at the
client-side to remote cloud platforms for further processing. These speech data
can reveal not only speech content and affective information but the speaker's
identity, demographic traits, and health status. Federated learning (FL) is a
distributed machine learning algorithm that coordinates clients to train a
model collaboratively without sharing local data. This algorithm shows enormous
potential for SER applications as sharing raw speech or speech features from a
user's device is vulnerable to privacy attacks. However, a major challenge in
FL is limited availability of high-quality labeled data samples. In this work,
we propose a semi-supervised federated learning framework, Semi-FedSER, that
utilizes both labeled and unlabeled data samples to address the challenge of
limited labeled data samples in FL. We show that our Semi-FedSER can generate
desired SER performance even when the local label rate l=20 using two SER
benchmark datasets: IEMOCAP and MSP-Improv.