López-Cabrera José Daniel, Orozco-Morales Rubén, Portal-Díaz Jorge Armando, Lovelle-Enríquez Orlando, Pérez-Díaz Marlén
Centro de Investigaciones de La Informática, Facultad de Matemática, Física y Computación, Universidad Central "Marta Abreu" de Las Villas, Villa Clara, Santa Clara, Cuba.
Departamento de Control Automático, Facultad de Ingeniería Eléctrica, Universidad Central "Marta Abreu" de Las Villas, Villa Clara, Santa Clara, Cuba.
Health Technol (Berl). 2021;11(6):1331-1345. doi: 10.1007/s12553-021-00609-8. Epub 2021 Oct 10.
Since the outbreak of the COVID-19 pandemic, computer vision researchers have been working on automatic identification of this disease using radiological images. The results achieved by automatic classification methods far exceed those of human specialists, with sensitivity as high as 100% being reported. However, prestigious radiology societies have stated that the use of this type of imaging alone is not recommended as a diagnostic method. According to some experts the patterns presented in these images are unspecific and subtle, overlapping with other viral pneumonias. This report seeks to evaluate the analysis the robustness and generalizability of different approaches using artificial intelligence, deep learning and computer vision to identify COVID-19 using chest X-rays images. We also seek to alert researchers and reviewers to the issue of "shortcut learning". Recommendations are presented to identify whether COVID-19 automatic classification models are being affected by shortcut learning. Firstly, papers using explainable artificial intelligence methods are reviewed. The results of applying external validation sets are evaluated to determine the generalizability of these methods. Finally, studies that apply traditional computer vision methods to perform the same task are considered. It is evident that using the whole chest X-Ray image or the bounding box of the lungs, the image regions that contribute most to the classification appear outside of the lung region, something that is not likely possible. In addition, although the investigations that evaluated their models on data sets external to the training set, the effectiveness of these models decreased significantly, it may provide a more realistic representation as how the model will perform in the clinic. The results indicate that, so far, the existing models often involve shortcut learning, which makes their use less appropriate in the clinical setting.
自新冠疫情爆发以来,计算机视觉研究人员一直在致力于利用放射图像自动识别这种疾病。自动分类方法取得的结果远远超过人类专家,据报道灵敏度高达100%。然而,著名的放射学会表示,不建议单独使用这种类型的成像作为诊断方法。一些专家认为,这些图像中呈现的模式不具特异性且很微妙,与其他病毒性肺炎重叠。本报告旨在评估使用人工智能、深度学习和计算机视觉通过胸部X光图像识别新冠病毒的不同方法的稳健性和通用性。我们还试图提醒研究人员和审稿人注意“捷径学习”问题。提出了一些建议,以确定新冠病毒自动分类模型是否受到捷径学习的影响。首先,对使用可解释人工智能方法的论文进行了综述。评估应用外部验证集的结果,以确定这些方法的通用性。最后,考虑了应用传统计算机视觉方法执行相同任务的研究。很明显,使用整个胸部X光图像或肺部的边界框时,对分类贡献最大的图像区域出现在肺部区域之外,这是不太可能的。此外,尽管有研究在训练集之外的数据集上评估了他们的模型,但这些模型的有效性显著下降,不过这可能更真实地反映了模型在临床中的表现。结果表明,到目前为止,现有模型常常涉及捷径学习,这使得它们在临床环境中的应用不太合适。