Tölle Malte, Garthe Philipp, Scherer Clemens, Seliger Jan Moritz, Leha Andreas, Krüger Nina, Simm Stefan, Martin Simon, Eble Sebastian, Kelm Halvar, Bednorz Moritz, André Florian, Bannas Peter, Diller Gerhard, Frey Norbert, Groß Stefan, Hennemuth Anja, Kaderali Lars, Meyer Alexander, Nagel Eike, Orwat Stefan, Seiffert Moritz, Friede Tim, Seidler Tim, Engelhardt Sandy
DZHK (German Centre for Cardiovascular Research), partner site Heidelberg/Mannheim, Heidelberg, Germany.
Department of Cardiology, Angiology and Pneumology, Heidelberg University Hospital, Heidelberg, Germany.
NPJ Digit Med. 2025 Feb 6;8(1):88. doi: 10.1038/s41746-025-01434-3.
Federated learning is a renowned technique for utilizing decentralized data while preserving privacy. However, real-world applications often face challenges like partially labeled datasets, where only a few locations have certain expert annotations, leaving large portions of unlabeled data unused. Leveraging these could enhance transformer architectures' ability in regimes with small and diversely annotated sets. We conduct the largest federated cardiac CT analysis to date (n = 8, 104) in a real-world setting across eight hospitals. Our two-step semi-supervised strategy distills knowledge from task-specific CNNs into a transformer. First, CNNs predict on unlabeled data per label type and then the transformer learns from these predictions with label-specific heads. This improves predictive accuracy and enables simultaneous learning of all partial labels across the federation, and outperforms UNet-based models in generalizability on downstream tasks. Code and model weights are made openly available for leveraging future cardiac CT analysis.
联邦学习是一种利用分散数据同时保护隐私的著名技术。然而,实际应用中常常面临诸如部分标记数据集之类的挑战,即只有少数地点拥有某些专家注释,导致大量未标记数据未被利用。利用这些数据可以增强变压器架构在小样本和多样化注释集情况下的能力。我们在八家医院的实际环境中进行了迄今为止最大规模的联邦心脏CT分析(n = 8104)。我们的两步半监督策略将特定任务的卷积神经网络(CNN)中的知识提炼到一个变压器中。首先,CNN对每个标签类型的未标记数据进行预测,然后变压器通过特定标签的头部从这些预测中学习。这提高了预测准确性,并能够在联邦范围内同时学习所有部分标签,并且在下游任务的泛化能力方面优于基于U-Net的模型。代码和模型权重已公开提供,以供未来的心脏CT分析使用。