Department of Computer Science, IIT Jodhpur, Karwar, Rajasthan, India.
Mahajan Imaging, New Delhi, India.
PLoS One. 2022 Oct 14;17(10):e0271931. doi: 10.1371/journal.pone.0271931. eCollection 2022.
Consistent clinical observations of characteristic findings of COVID-19 pneumonia on chest X-rays have attracted the research community to strive to provide a fast and reliable method for screening suspected patients. Several machine learning algorithms have been proposed to find the abnormalities in the lungs using chest X-rays specific to COVID-19 pneumonia and distinguish them from other etiologies of pneumonia. However, despite the enormous magnitude of the pandemic, there are very few instances of public databases of COVID-19 pneumonia, and to the best of our knowledge, there is no database with annotation of abnormalities on the chest X-rays of COVID-19 affected patients. Annotated databases of X-rays can be of significant value in the design and development of algorithms for disease prediction. Further, explainability analysis for the performance of existing or new deep learning algorithms will be enhanced significantly with access to ground-truth abnormality annotations. The proposed COVID Abnormality Annotation for X-Rays (CAAXR) database is built upon the BIMCV-COVID19+ database which is a large-scale dataset containing COVID-19+ chest X-rays. The primary contribution of this study is the annotation of the abnormalities in over 1700 frontal chest X-rays. Further, we define protocols for semantic segmentation as well as classification for robust evaluation of algorithms. We provide benchmark results on the defined protocols using popular deep learning models such as DenseNet, ResNet, MobileNet, and VGG for classification, and UNet, SegNet, and Mask-RCNN for semantic segmentation. The classwise accuracy, sensitivity, and AUC-ROC scores are reported for the classification models, and the IoU and DICE scores are reported for the segmentation models.
对 COVID-19 肺炎胸部 X 光片特征性发现的一致临床观察引起了研究界的关注,他们努力提供一种快速可靠的方法来筛选疑似患者。已经提出了几种机器学习算法,使用 COVID-19 肺炎特有的胸部 X 光片来寻找肺部的异常,并将其与其他肺炎病因区分开来。然而,尽管大流行规模巨大,但 COVID-19 肺炎的公共数据库非常少,据我们所知,没有带有 COVID-19 肺炎患者胸部 X 光片异常注释的数据库。X 光片的带注释数据库对于疾病预测算法的设计和开发具有重要价值。此外,通过访问地面真实异常注释,可以大大增强对现有或新的深度学习算法性能的可解释性分析。所提出的 COVID 异常 X 射线注释(CAAXR)数据库是基于 BIMCV-COVID19+ 数据库构建的,该数据库是一个包含 COVID-19+ 胸部 X 光片的大型数据集。本研究的主要贡献是在 1700 多张正面胸部 X 光片中对异常情况进行注释。此外,我们为语义分割和分类定义了协议,以对算法进行稳健评估。我们使用流行的深度学习模型(如 DenseNet、ResNet、MobileNet 和 VGG 进行分类)和 UNet、SegNet 和 Mask-RCNN 进行语义分割,在定义的协议上提供基准结果。报告了分类模型的类别准确率、敏感度和 AUC-ROC 分数,以及分割模型的 IoU 和 DICE 分数。