Shen Ning, Xu Tingfa, Huang Shiqi, Chen Zhenxiang, Li Jianan
IEEE Trans Med Imaging. 2025 May 14;PP. doi: 10.1109/TMI.2025.3570054.
Recent advancements in semi-supervised federated learning (SSFL) have significantly enhanced public health services by enabling medical institutions to share model updates via a central server. However, most SSFL approaches are based on conservative assumptions, such as labels-at-server and labels-at-client, which fail to fully capture the complex and diverse data distributions inherent in medical institutions. To address this limitation, we introduce a novel application of SSFL tailored to a realistic client data scenario, encompassing clients with fully-labeled, partially-labeled, and fully-unlabeled data. This approach effectively navigates varying levels of data annotation by maximizing the utility of unlabeled samples within the client federation. To tackle the challenges posed by such a complex scenario, we propose a new SSFL framework, FedCD. FedCD incorporates three client-distilled models, each corresponding to a distinct client data distribution, alongside server-client federation. First, each client-distilled model condenses the diverse parameters of the client federation into robust knowledge through distillation. The contribution of each client model is then dynamically adjusted based on its proximity to the client-distilled model, ensuring that the framework adapts to the heterogeneous characteristics of individual clients. By aggregating client-distilled models, FedCD implements model drift correction, effectively mitigating parameter drift across heterogeneous models. This dynamic federated approach not only harnesses unlabeled data efficiently but also accommodates diverse annotation levels while adapting to varying data distributions. Extensive experiments on two medical image segmentation tasks and one classification task demonstrate the superiority of our method, highlighting its ability to address realistic challenges in medical data scenarios.
半监督联邦学习(SSFL)的最新进展通过使医疗机构能够通过中央服务器共享模型更新,显著增强了公共卫生服务。然而,大多数SSFL方法基于保守假设,如服务器端标签和客户端标签,无法充分捕捉医疗机构中固有的复杂多样的数据分布。为了解决这一限制,我们引入了一种针对实际客户端数据场景量身定制的SSFL新应用,该场景涵盖具有完全标记、部分标记和完全未标记数据的客户端。这种方法通过最大化客户端联盟中未标记样本的效用,有效地应对了不同程度的数据标注问题。为了应对这种复杂场景带来的挑战,我们提出了一个新的SSFL框架FedCD。FedCD结合了三个客户端提炼模型,每个模型对应一种不同的客户端数据分布,同时还包括服务器-客户端联盟。首先,每个客户端提炼模型通过蒸馏将客户端联盟的不同参数浓缩为稳健的知识。然后,根据每个客户端模型与客户端提炼模型的接近程度动态调整其贡献,确保框架适应各个客户端的异构特征。通过聚合客户端提炼模型,FedCD实现了模型漂移校正,有效减轻了异构模型之间的参数漂移。这种动态联邦方法不仅有效地利用了未标记数据,还能适应不同的标注水平并适应变化的数据分布。在两个医学图像分割任务和一个分类任务上进行的大量实验证明了我们方法的优越性,突出了其应对医学数据场景中现实挑战的能力。