IEEE J Biomed Health Inform. 2020 Oct;24(10):2806-2813. doi: 10.1109/JBHI.2020.3023246. Epub 2020 Sep 10.
The pandemic of coronavirus disease 2019 (COVID-19) has lead to a global public health crisis spreading hundreds of countries. With the continuous growth of new infections, developing automated tools for COVID-19 identification with CT image is highly desired to assist the clinical diagnosis and reduce the tedious workload of image interpretation. To enlarge the datasets for developing machine learning methods, it is essentially helpful to aggregate the cases from different medical systems for learning robust and generalizable models. This paper proposes a novel joint learning framework to perform accurate COVID-19 identification by effectively learning with heterogeneous datasets with distribution discrepancy. We build a powerful backbone by redesigning the recently proposed COVID-Net in aspects of network architecture and learning strategy to improve the prediction accuracy and learning efficiency. On top of our improved backbone, we further explicitly tackle the cross-site domain shift by conducting separate feature normalization in latent space. Moreover, we propose to use a contrastive training objective to enhance the domain invariance of semantic embeddings for boosting the classification performance on each dataset. We develop and evaluate our method with two public large-scale COVID-19 diagnosis datasets made up of CT images. Extensive experiments show that our approach consistently improves the performanceson both datasets, outperforming the original COVID-Net trained on each dataset by 12.16% and 14.23% in AUC respectively, also exceeding existing state-of-the-art multi-site learning methods.
2019 年冠状病毒病(COVID-19)大流行导致了一场在数百个国家蔓延的全球公共卫生危机。随着新感染病例的不断增长,人们非常希望开发一种基于 CT 图像的 COVID-19 自动识别工具,以协助临床诊断并减轻图像解读的繁琐工作量。为了扩大用于开发机器学习方法的数据集,从不同医疗系统中汇总病例以学习稳健且可推广的模型是非常有帮助的。本文提出了一种新颖的联合学习框架,通过有效学习具有分布差异的异构数据集,实现 COVID-19 的准确识别。我们通过重新设计最近提出的 COVID-Net 的网络架构和学习策略来构建一个强大的骨干网络,以提高预测精度和学习效率。在我们改进的骨干网络上,我们进一步通过在潜在空间中进行单独的特征归一化来明确解决跨站点域转移问题。此外,我们提出使用对比训练目标来增强语义嵌入的域不变性,以提高在每个数据集上的分类性能。我们使用由 CT 图像组成的两个公共大规模 COVID-19 诊断数据集来开发和评估我们的方法。广泛的实验表明,我们的方法在两个数据集上的表现都有所提高,在 AUC 上分别比在每个数据集上训练的原始 COVID-Net 提高了 12.16%和 14.23%,也超过了现有的多站点学习方法。