Che Liwei, Wang Jiaqi, Zhou Yao, Ma Fenglong
College of Information Sciences and Technology, Pennsylvania State University, University Park, PA 16802, USA.
Instacart, San Francisco, CA 94105, USA.
Sensors (Basel). 2023 Aug 6;23(15):6986. doi: 10.3390/s23156986.
Federated learning (FL), which provides a collaborative training scheme for distributed data sources with privacy concerns, has become a burgeoning and attractive research area. Most existing FL studies focus on taking unimodal data, such as image and text, as the model input and resolving the heterogeneity challenge, i.e., the challenge of non-identical distribution (non-IID) caused by a data distribution imbalance related to data labels and data amount. In real-world applications, data are usually described by multiple modalities. However, to the best of our knowledge, only a handful of studies have been conducted to improve system performance utilizing multimodal data. In this survey paper, we identify the significance of this emerging research topic of multimodal federated learning (MFL) and present a literature review on the state-of-art MFL methods. Furthermore, we categorize multimodal federated learning into congruent and incongruent multimodal federated learning based on whether all clients possess the same modal combinations. We investigate the feasible application tasks and related benchmarks for MFL. Lastly, we summarize the promising directions and fundamental challenges in this field for future research.
联邦学习(FL)为有隐私顾虑的分布式数据源提供了一种协作训练方案,已成为一个新兴且有吸引力的研究领域。大多数现有的联邦学习研究专注于将单模态数据(如图像和文本)作为模型输入,并解决异质性挑战,即由与数据标签和数据量相关的数据分布不平衡导致的非相同分布(非IID)挑战。在实际应用中,数据通常由多种模态描述。然而,据我们所知,仅有少数研究利用多模态数据来提高系统性能。在这篇综述论文中,我们确定了多模态联邦学习(MFL)这一新兴研究主题的重要性,并对当前最先进的多模态联邦学习方法进行文献综述。此外,我们根据所有客户端是否拥有相同的模态组合,将多模态联邦学习分为一致多模态联邦学习和不一致多模态联邦学习。我们研究了多模态联邦学习的可行应用任务和相关基准。最后,我们总结了该领域未来研究的有前景的方向和基本挑战。