Nvidia Corporation, 4500 East West Highway, Bethesda, Maryland 20814, USA.
Molecular Imaging Branch, National Cancer Institute, NIH, Bethesda, MD, USA; Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Molecular Imaging Branch, National Cancer Institute, NIH, Bethesda, MD USA.
Med Image Anal. 2021 May;70:101992. doi: 10.1016/j.media.2021.101992. Epub 2021 Feb 6.
The recent outbreak of Coronavirus Disease 2019 (COVID-19) has led to urgent needs for reliable diagnosis and management of SARS-CoV-2 infection. The current guideline is using RT-PCR for testing. As a complimentary tool with diagnostic imaging, chest Computed Tomography (CT) has been shown to be able to reveal visual patterns characteristic for COVID-19, which has definite value at several stages during the disease course. To facilitate CT analysis, recent efforts have focused on computer-aided characterization and diagnosis with chest CT scan, which has shown promising results. However, domain shift of data across clinical data centers poses a serious challenge when deploying learning-based models. A common way to alleviate this issue is to fine-tune the model locally with the target domains local data and annotations. Unfortunately, the availability and quality of local annotations usually varies due to heterogeneity in equipment and distribution of medical resources across the globe. This impact may be pronounced in the detection of COVID-19, since the relevant patterns vary in size, shape, and texture. In this work, we attempt to find a solution for this challenge via federated and semi-supervised learning. A multi-national database consisting of 1704 scans from three countries is adopted to study the performance gap, when training a model with one dataset and applying it to another. Expert radiologists manually delineated 945 scans for COVID-19 findings. In handling the variability in both the data and annotations, a novel federated semi-supervised learning technique is proposed to fully utilize all available data (with or without annotations). Federated learning avoids the need for sensitive data-sharing, which makes it favorable for institutions and nations with strict regulatory policy on data privacy. Moreover, semi-supervision potentially reduces the annotation burden under a distributed setting. The proposed framework is shown to be effective compared to fully supervised scenarios with conventional data sharing instead of model weight sharing.
最近爆发的 2019 年冠状病毒病(COVID-19)导致对 SARS-CoV-2 感染的可靠诊断和管理的迫切需求。目前的指南是使用 RT-PCR 进行检测。作为诊断影像学的补充工具,胸部计算机断层扫描(CT)已被证明能够揭示 COVID-19 的特征性视觉模式,在疾病过程的几个阶段都具有明确的价值。为了方便 CT 分析,最近的努力集中在使用胸部 CT 扫描进行计算机辅助特征描述和诊断上,这已经显示出了有希望的结果。然而,跨临床数据中心的数据域转移在部署基于学习的模型时带来了严重的挑战。缓解这个问题的一种常见方法是使用目标域的本地数据和注释来对模型进行局部微调。不幸的是,由于全球各地设备的异质性和医疗资源的分布不均,本地注释的可用性和质量通常会有所不同。在 COVID-19 的检测中,这种影响可能更为明显,因为相关模式的大小、形状和纹理都有所不同。在这项工作中,我们试图通过联邦学习和半监督学习来解决这个挑战。采用一个由三个国家的 1704 个扫描组成的多国家数据库来研究在使用一个数据集进行模型训练并将其应用于另一个数据集时的性能差距。专家放射科医生手动对 945 个扫描进行了 COVID-19 发现的标记。在处理数据和注释的变异性时,提出了一种新颖的联邦半监督学习技术,以充分利用所有可用的数据(有或没有注释)。联邦学习避免了对敏感数据共享的需求,这使得它对数据隐私监管政策严格的机构和国家有利。此外,半监督在分布式设置下有可能减少注释负担。与传统的数据共享而不是模型权重共享的完全监督场景相比,所提出的框架被证明是有效的。