Song Haoyue, Wang Jiacheng, Zhou Jianjun, Wang Liansheng
IEEE Trans Med Imaging. 2025 Apr;44(4):1931-1941. doi: 10.1109/TMI.2024.3523378. Epub 2025 Apr 3.
Multimodal Federated Learning (MFL) has emerged as a collaborative paradigm for training models across decentralized devices, harnessing various data modalities to facilitate effective learning while respecting data ownership. In this realm, notably, a pivotal shift from homogeneous to heterogeneous MFL has taken place. While the former assumes uniformity in input modalities across clients, the latter accommodates modality-incongruous setups, which is often the case in practical situations. For example, while some advanced medical institutions have the luxury of utilizing both MRI and CT for disease diagnosis, remote hospitals often find themselves constrained to employ CT exclusively due to its cost-effectiveness. Although heterogeneous MFL can apply to a broader scenario, it introduces a new challenge: modality-heterogeneous client drift, arising from diverse modality-coupled local optimization. To address this, we introduce FedMM, a simple yet effective approach. During local optimization, FedMM employs modality dropout, randomly masking available modalities, and promoting weight alignment while preserving model expressivity on its original modality combination. To enhance the modality dropout process, FedMM incorporates a task-specific inter- and intra-modal regularizer, which acts as an additional constraint, forcing that weight distribution remains more consistent across diverse input modalities and therefore eases the optimization process with modality dropout enabled. By combining them, our approach holistically addresses client drift. It fosters convergence among client models while considering each client's unique input modalities, enhancing heterogeneous MFL performance. Comprehensive evaluations in three medical image segmentation datasets demonstrate FedMM's superiority over state-of-the-art heterogeneous MFL methods.
多模态联邦学习(MFL)已成为一种跨分散设备训练模型的协作范式,它利用各种数据模态来促进有效学习,同时尊重数据所有权。在这个领域,值得注意的是,已经发生了从同构MFL到异构MFL的关键转变。前者假设客户端的输入模态是统一的,而后者则适应模态不一致的设置,这在实际情况中经常出现。例如,一些先进的医疗机构可以同时使用MRI和CT进行疾病诊断,而偏远医院由于成本效益的原因,往往只能使用CT。尽管异构MFL可以应用于更广泛的场景,但它带来了一个新的挑战:模态异构客户端漂移,这是由不同的模态耦合局部优化引起的。为了解决这个问题,我们引入了FedMM,一种简单而有效的方法。在局部优化过程中,FedMM采用模态随机失活,随机屏蔽可用模态,促进权重对齐,同时保持模型在其原始模态组合上的表现力。为了增强模态随机失活过程,FedMM引入了一个特定任务的模态间和模态内正则化器,它作为一个额外的约束,迫使权重分布在不同的输入模态之间保持更一致,从而简化了启用模态随机失活后的优化过程。通过将它们结合起来,我们的方法全面地解决了客户端漂移问题。它促进了客户端模型之间的收敛,同时考虑了每个客户端独特的输入模态,提高了异构MFL的性能。在三个医学图像分割数据集上的综合评估表明,FedMM优于现有的异构MFL方法。