Zhang Hanwen, Chen Mingzhi, Liu Yuxi, Luo Guibo, Zhu Yuesheng
IEEE J Biomed Health Inform. 2025 Jul;29(7):5042-5055. doi: 10.1109/JBHI.2025.3549029.
Learning from multi-center medical datasets to obtain a high-performance global model is challenging due to the privacy protection and data heterogeneity in healthcare systems. Current federated learning approaches are not efficient enough to learn Non-Independent and Identically Distributed (Non-IID) data and require high communication costs. In this work, a practical privacy computing framework is proposed to train a Non-IID medical image segmentation model under various multi-center setting in low communication cost. Specifically, an efficient cascaded diffusion model is trained to generate image-mask pairs that have similar distribution to the training data of clients, providing rich labeled data on client side to mitigate heterogeneity. Also, a label construction module is developed to improve the quality of generated image-mask pairs. Moreover, a set of aggregation methods is proposed to achieve global model from data generated from Cascaded Diffusion model for diverse scenarios: CD-Syn, CD-Ens and its extension CD-KD. CD-Syn is a one-shot method that trains segmentation model solely on public generated datasets while CD-Ens and CD-KD maximize the utilization of local original data by an extra communication round of ensemble or knowledge distillation. In this way, the setting of our proposed framework is highly practical, providing multiple aggregation methods which can flexibly adapt to varying demands for efficiency, privacy, and accuracy. We systematically evaluated the effectiveness of our proposed framework on five Non-IID medical datasets and observe 5.38% improvement in Dice score compared with baseline method (FednnU-Net) on average.
由于医疗保健系统中的隐私保护和数据异质性,从多中心医学数据集中学习以获得高性能的全局模型具有挑战性。当前的联邦学习方法在学习非独立同分布(Non-IID)数据方面效率不够高,并且需要高昂的通信成本。在这项工作中,提出了一种实用的隐私计算框架,以在各种多中心设置下以低通信成本训练非独立同分布医学图像分割模型。具体而言,训练了一种高效的级联扩散模型来生成与客户端训练数据具有相似分布的图像-掩码对,在客户端提供丰富的标记数据以减轻异质性。此外,还开发了一个标签构建模块来提高生成的图像-掩码对的质量。此外,还提出了一组聚合方法,以从级联扩散模型生成的数据中针对不同场景实现全局模型:CD-Syn、CD-Ens及其扩展CD-KD。CD-Syn是一种一次性方法,仅在公共生成的数据集上训练分割模型,而CD-Ens和CD-KD通过额外的一轮集成或知识蒸馏通信来最大限度地利用本地原始数据。通过这种方式,我们提出的框架设置非常实用,提供了多种聚合方法,可以灵活适应对效率、隐私和准确性的不同需求。我们在五个非独立同分布医学数据集上系统地评估了我们提出的框架的有效性,平均而言,与基线方法(FednnU-Net)相比,Dice分数提高了5.