IEEE J Biomed Health Inform. 2020 Oct;24(10):2787-2797. doi: 10.1109/JBHI.2020.3018181. Epub 2020 Aug 20.
Coronavirus Disease 2019 (COVID-19) has rapidly spread worldwide since first reported. Timely diagnosis of COVID-19 is crucial both for disease control and patient care. Non-contrast thoracic computed tomography (CT) has been identified as an effective tool for the diagnosis, yet the disease outbreak has placed tremendous pressure on radiologists for reading the exams and may potentially lead to fatigue-related mis-diagnosis. Reliable automatic classification algorithms can be really helpful; however, they usually require a considerable number of COVID-19 cases for training, which is difficult to acquire in a timely manner. Meanwhile, how to effectively utilize the existing archive of non-COVID-19 data (the negative samples) in the presence of severe class imbalance is another challenge. In addition, the sudden disease outbreak necessitates fast algorithm development. In this work, we propose a novel approach for effective and efficient training of COVID-19 classification networks using a small number of COVID-19 CT exams and an archive of negative samples. Concretely, a novel self-supervised learning method is proposed to extract features from the COVID-19 and negative samples. Then, two kinds of soft-labels ('difficulty' and 'diversity') are generated for the negative samples by computing the earth mover's distances between the features of the negative and COVID-19 samples, from which data 'values' of the negative samples can be assessed. A pre-set number of negative samples are selected accordingly and fed to the neural network for training. Experimental results show that our approach can achieve superior performance using about half of the negative samples, substantially reducing model training time.
自首次报告以来,2019 年冠状病毒病(COVID-19)已在全球迅速蔓延。COVID-19 的及时诊断对于疾病控制和患者护理至关重要。非对比性胸部计算机断层扫描(CT)已被确定为诊断的有效工具,然而,疾病的爆发给放射科医生阅读检查带来了巨大的压力,并且可能导致与疲劳相关的误诊。可靠的自动分类算法可以提供很大的帮助;然而,它们通常需要大量的 COVID-19 病例进行训练,而这在及时获得方面是具有挑战性的。同时,在严重的类别不平衡情况下,如何有效地利用现有的非 COVID-19 数据(阴性样本)也是另一个挑战。此外,疾病的突然爆发需要快速开发算法。在这项工作中,我们提出了一种新的方法,使用少量 COVID-19 CT 检查和阴性样本档案来有效地训练 COVID-19 分类网络。具体来说,我们提出了一种新的自监督学习方法,从 COVID-19 和阴性样本中提取特征。然后,通过计算阴性和 COVID-19 样本之间的特征之间的欧几里得距离,为阴性样本生成两种软标签(“难度”和“多样性”),从而可以评估阴性样本的数据“值”。相应地选择了预定数量的阴性样本,并将其输入神经网络进行训练。实验结果表明,我们的方法可以使用大约一半的阴性样本实现优异的性能,大大减少了模型训练时间。