Hu Kai, Huang Yingjie, Huang Wei, Tan Hui, Chen Zhineng, Zhong Zheng, Li Xuanya, Zhang Yuan, Gao Xieping
Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China.
Key Laboratory of Medical Imaging and Artifical Intelligence of Hunan Province, Xiangnan University, Chenzhou 423000, China.
Neurocomputing (Amst). 2021 Oct 7;458:232-245. doi: 10.1016/j.neucom.2021.06.012. Epub 2021 Jun 7.
The outbreak and rapid spread of coronavirus disease 2019 (COVID-19) has had a huge impact on the lives and safety of people around the world. Chest CT is considered an effective tool for the diagnosis and follow-up of COVID-19. For faster examination, automatic COVID-19 diagnostic techniques using deep learning on CT images have received increasing attention. However, the number and category of existing datasets for COVID-19 diagnosis that can be used for training are limited, and the number of initial COVID-19 samples is much smaller than the normal's, which leads to the problem of class imbalance. It makes the classification algorithms difficult to learn the discriminative boundaries since the data of some classes are rich while others are scarce. Therefore, training robust deep neural networks with imbalanced data is a fundamental challenging but important task in the diagnosis of COVID-19. In this paper, we create a challenging clinical dataset (named COVID19-Diag) with category diversity and propose a novel imbalanced data classification method using deep supervised learning with a self-adaptive auxiliary loss (DSN-SAAL) for COVID-19 diagnosis. The loss function considers both the effects of data overlap between CT slices and possible noisy labels in clinical datasets on a multi-scale, deep supervised network framework by integrating the effective number of samples and a weighting regularization item. The learning process jointly and automatically optimizes all parameters over the deep supervised network, making our model generally applicable to a wide range of datasets. Extensive experiments are conducted on COVID19-Diag and three public COVID-19 diagnosis datasets. The results show that our DSN-SAAL outperforms the state-of-the-art methods and is effective for the diagnosis of COVID-19 in varying degrees of data imbalance.
2019冠状病毒病(COVID-19)的爆发和迅速传播对全球人民的生命和安全产生了巨大影响。胸部CT被认为是COVID-19诊断和随访的有效工具。为了更快地进行检查,利用深度学习对CT图像进行自动COVID-19诊断技术受到了越来越多的关注。然而,可用于训练的现有COVID-19诊断数据集的数量和类别有限,且初始COVID-19样本数量远少于正常样本,这导致了类别不平衡问题。由于某些类别的数据丰富而其他类别的数据稀缺,使得分类算法难以学习判别边界。因此,使用不平衡数据训练鲁棒的深度神经网络是COVID-19诊断中一项具有挑战性但又很重要的基础任务。在本文中,我们创建了一个具有类别多样性的具有挑战性的临床数据集(名为COVID19-Diag),并提出了一种新颖的不平衡数据分类方法,即使用带有自适应辅助损失的深度监督学习(DSN-SAAL)进行COVID-19诊断。损失函数通过整合样本的有效数量和加权正则化项,在多尺度深度监督网络框架中考虑了CT切片之间的数据重叠以及临床数据集中可能存在的噪声标签的影响。学习过程在深度监督网络上联合并自动优化所有参数,使我们的模型普遍适用于广泛的数据集。我们在COVID19-Diag和三个公共COVID-19诊断数据集上进行了广泛的实验。结果表明,我们的DSN-SAAL优于现有方法,并且在不同程度的数据不平衡情况下对COVID-19诊断均有效。