Departamento de Informática, Universidade Estadual de Maringá, Maringá 87020-900, Brazil.
Instituto Federal do Paraná, Pinhais 83330-200, Brazil.
Sensors (Basel). 2021 Oct 27;21(21):7116. doi: 10.3390/s21217116.
COVID-19 frequently provokes pneumonia, which can be diagnosed using imaging exams. Chest X-ray (CXR) is often useful because it is cheap, fast, widespread, and uses less radiation. Here, we demonstrate the impact of lung segmentation in COVID-19 identification using CXR images and evaluate which contents of the image influenced the most. Semantic segmentation was performed using a U-Net CNN architecture, and the classification using three CNN architectures (VGG, ResNet, and Inception). Explainable Artificial Intelligence techniques were employed to estimate the impact of segmentation. A three-classes database was composed: lung opacity (pneumonia), COVID-19, and normal. We assessed the impact of creating a CXR image database from different sources, and the COVID-19 generalization from one source to another. The segmentation achieved a Jaccard distance of 0.034 and a Dice coefficient of 0.982. The classification using segmented images achieved an F1-Score of 0.88 for the multi-class setup, and 0.83 for COVID-19 identification. In the cross-dataset scenario, we obtained an F1-Score of 0.74 and an area under the ROC curve of 0.9 for COVID-19 identification using segmented images. Experiments support the conclusion that even after segmentation, there is a strong bias introduced by underlying factors from different sources.
COVID-19 常引发肺炎,可通过影像学检查进行诊断。由于 X 线胸片(CXR)价格低廉、快速、普及且辐射量少,因此通常很有用。在此,我们展示了使用 CXR 图像进行 COVID-19 识别的肺部分割的影响,并评估了图像的哪些内容影响最大。语义分割使用 U-Net CNN 架构进行,分类使用三种 CNN 架构(VGG、ResNet 和 Inception)进行。采用可解释人工智能技术来估计分割的影响。构建了一个三分类数据库:肺不透明度(肺炎)、COVID-19 和正常。我们评估了从不同来源创建 CXR 图像数据库的影响,以及从一个来源到另一个来源的 COVID-19 泛化。分割的 Jaccard 距离为 0.034,Dice 系数为 0.982。使用分割后的图像进行分类,在多类设置中获得了 0.88 的 F1-Score,在 COVID-19 识别中获得了 0.83 的 F1-Score。在跨数据集场景中,使用分割后的图像进行 COVID-19 识别,我们获得了 0.74 的 F1-Score 和 0.9 的 ROC 曲线下面积。实验结果支持这样的结论,即在分割之后,不同来源的潜在因素仍然会引入很强的偏差。