DeGrave Alex J, Cai Zhuo Ran, Janizek Joseph D, Daneshjou Roxana, Lee Su-In
Paul G. Allen School of Computer Science and Engineering, University of Washington.
Medical Scientist Training Program, University of Washington.
medRxiv. 2023 May 16:2023.05.12.23289878. doi: 10.1101/2023.05.12.23289878.
Despite the proliferation and clinical deployment of artificial intelligence (AI)-based medical software devices, most remain black boxes that are uninterpretable to key stakeholders including patients, physicians, and even the developers of the devices. Here, we present a general model auditing framework that combines insights from medical experts with a highly expressive form of explainable AI that leverages generative models, to understand the reasoning processes of AI devices. We then apply this framework to generate the first thorough, medically interpretable picture of the reasoning processes of machine-learning-based medical image AI. In our synergistic framework, a generative model first renders "counterfactual" medical images, which in essence visually represent the reasoning process of a medical AI device, and then physicians translate these counterfactual images to medically meaningful features. As our use case, we audit five high-profile AI devices in dermatology, an area of particular interest since dermatology AI devices are beginning to achieve deployment globally. We reveal how dermatology AI devices rely both on features used by human dermatologists, such as lesional pigmentation patterns, as well as multiple, previously unreported, potentially undesirable features, such as background skin texture and image color balance. Our study also sets a precedent for the rigorous application of explainable AI to understand AI in any specialized domain and provides a means for practitioners, clinicians, and regulators to uncloak AI's powerful but previously enigmatic reasoning processes in a medically understandable way.
尽管基于人工智能(AI)的医学软件设备不断涌现并在临床中得到应用,但大多数设备仍是黑箱,对于包括患者、医生甚至设备开发者在内的关键利益相关者来说都难以理解。在此,我们提出了一个通用的模型审计框架,该框架将医学专家的见解与一种高度可表达的可解释AI形式相结合,这种可解释AI利用生成模型来理解AI设备的推理过程。然后,我们应用这个框架,首次全面、从医学角度可解释地描绘了基于机器学习的医学图像AI的推理过程。在我们的协同框架中,一个生成模型首先生成“反事实”医学图像,这些图像本质上直观地呈现了医学AI设备的推理过程,然后医生将这些反事实图像转化为具有医学意义的特征。作为我们的应用案例,我们对皮肤科领域的五个备受瞩目的AI设备进行了审计,皮肤科AI设备开始在全球范围内得到应用,这一领域尤其受到关注。我们揭示了皮肤科AI设备如何既依赖人类皮肤科医生使用的特征,如皮损色素沉着模式,也依赖多种以前未报告的、可能不受欢迎的特征,如背景皮肤纹理和图像色彩平衡。我们的研究还为在任何专业领域严格应用可解释AI来理解AI树立了先例,并为从业者、临床医生和监管机构提供了一种方式,以医学上可理解的方式揭开AI强大但以前神秘的推理过程。