DeGrave Alex J, Janizek Joseph D, Lee Su-In
Paul G. Allen School of Computer Science and Engineering, University of Washington.
Medical Scientist Training Program, University of Washington.
medRxiv. 2020 Oct 7:2020.09.13.20193565. doi: 10.1101/2020.09.13.20193565.
Artificial intelligence (AI) researchers and radiologists have recently reported AI systems that accurately detect COVID-19 in chest radiographs. However, the robustness of these systems remains unclear. Using state-of-the-art techniques in explainable AI, we demonstrate that recent deep learning systems to detect COVID-19 from chest radiographs rely on confounding factors rather than medical pathology, creating an alarming situation in which the systems appear accurate, but fail when tested in new hospitals. We observe that the approach to obtain training data for these AI systems introduces a nearly ideal scenario for AI to learn these spurious "shortcuts." Because this approach to data collection has also been used to obtain training data for detection of COVID-19 in computed tomography scans and for medical imaging tasks related to other diseases, our study reveals a far-reaching problem in medical imaging AI. In addition, we show that evaluation of a model on external data is insufficient to ensure AI systems rely on medically relevant pathology, since the undesired "shortcuts" learned by AI systems may not impair performance in new hospitals. These findings demonstrate that explainable AI should be seen as a prerequisite to clinical deployment of ML healthcare models.
人工智能(AI)研究人员和放射科医生最近报告了能在胸部X光片中准确检测出新冠病毒(COVID-19)的AI系统。然而,这些系统的稳健性仍不明确。我们运用可解释人工智能领域的前沿技术证明,近期用于从胸部X光片中检测COVID-19的深度学习系统依赖的是混杂因素而非医学病理学特征,这就造成了一种令人担忧的情况:这些系统看似准确,但在新医院进行测试时却会出错。我们发现,为这些AI系统获取训练数据的方式为AI学习这些虚假的“捷径”创造了近乎理想的条件。由于这种数据收集方式也被用于获取计算机断层扫描中COVID-19检测以及其他疾病相关医学成像任务的训练数据,我们的研究揭示了医学成像AI中一个影响深远的问题。此外,我们表明,在外部数据上对模型进行评估不足以确保AI系统依赖医学相关的病理学特征,因为AI系统学到的不良“捷径”可能不会影响其在新医院的性能。这些发现表明,可解释人工智能应被视为机器学习医疗模型临床应用的先决条件。