Xu Yujia, Lam Hak-Keung, Jia Guangyu
Centre for Robotics Research, Department of Engineering, King's College London, WC2R 2LS, UK.
Neurocomputing (Amst). 2021 Jul 5;443:96-105. doi: 10.1016/j.neucom.2021.03.034. Epub 2021 Mar 18.
The early detection of infection is significant for the fight against the ongoing COVID-19 pandemic. Chest X-ray (CXR) imaging is an efficient screening technique via which lung infections can be detected. This paper aims to distinguish COVID-19 positive cases from the other four classes, including normal, tuberculosis (TB), bacterial pneumonia (BP), and viral pneumonia (VP), using CXR images. The existing COVID-19 classification researches have achieved some successes with deep learning techniques while sometimes lacking interpretability and generalization ability. Hence, we propose a two-stage classification method MANet to address these issues in computer-aided COVID-19 diagnosis. Particularly, a segmentation model predicts the masks for all CXR images to extract their lung regions at the first stage. A followed classification CNN at the second stage then classifies the segmented CXR images into five classes based only on the preserved lung regions. In this segment-based classification task, we propose the mask attention mechanism (MA) which uses the predicted masks at the first stage as spatial attention maps to adjust the features of the CNN at the second stage. The MA spatial attention maps for features calculate the percentage of masked pixels in their receptive fields, suppressing the feature values based on the overlapping rates between their receptive fields and the segmented lung regions. In evaluation, we segment out the lung regions of all CXR images through a UNet with ResNet backbone, and then perform classification on the segmented CXR images using four classic CNNs with or without MA, including ResNet34, ResNet50, VGG16, and Inceptionv3. The experimental results illustrate that the classification models with MA have higher classification accuracy, more stable training process, and better interpretability and generalization ability than those without MA. Among the evaluated classification models, ResNet50 with MA achieves the highest average test accuracy of 96.32 in three runs, and the highest one is 97.06 . Meanwhile, the attention heat maps visualized by Grad-CAM indicate that models with MA make more reliable predictions based on the pathological patterns in lung regions. This further presents the potential of MANet to provide clinicians with diagnosis assistance.
感染的早期检测对于抗击当前的新冠疫情至关重要。胸部X光(CXR)成像是一种有效的筛查技术,通过它可以检测肺部感染。本文旨在利用CXR图像将新冠阳性病例与其他四类区分开来,这四类包括正常、肺结核(TB)、细菌性肺炎(BP)和病毒性肺炎(VP)。现有的新冠分类研究通过深度学习技术取得了一些成功,但有时缺乏可解释性和泛化能力。因此,我们提出了一种两阶段分类方法MANet来解决计算机辅助新冠诊断中的这些问题。具体来说,一个分割模型在第一阶段预测所有CXR图像的掩码,以提取它们的肺部区域。随后在第二阶段的分类卷积神经网络(CNN)仅基于保留的肺部区域将分割后的CXR图像分为五类。在这个基于片段的分类任务中,我们提出了掩码注意力机制(MA),它将第一阶段预测的掩码用作空间注意力图,以调整第二阶段CNN的特征。用于特征的MA空间注意力图计算其感受野中被掩码像素的百分比,根据其感受野与分割后的肺部区域之间的重叠率抑制特征值。在评估中,我们通过具有ResNet骨干的UNet分割出所有CXR图像的肺部区域,然后使用四个带有或不带有MA的经典CNN对分割后的CXR图像进行分类,包括ResNet34、ResNet50、VGG16和Inceptionv3。实验结果表明,与没有MA的模型相比,带有MA的分类模型具有更高的分类准确率、更稳定的训练过程以及更好的可解释性和泛化能力。在评估的分类模型中,带有MA的ResNet50在三次运行中达到了最高平均测试准确率96.32%,最高一次为97.06%。同时,通过Grad-CAM可视化的注意力热图表明,带有MA的模型基于肺部区域的病理模式做出了更可靠的预测。这进一步展示了MANet为临床医生提供诊断辅助的潜力。