Gaube Susanne, Suresh Harini, Raue Martina, Merritt Alexander, Berkowitz Seth J, Lermer Eva, Coughlin Joseph F, Guttag John V, Colak Errol, Ghassemi Marzyeh
Department of Psychology, University of Regensburg, Regensburg, Germany.
MIT AgeLab, Massachusetts Institute of Technology, Cambridge, MA, USA.
NPJ Digit Med. 2021 Feb 19;4(1):31. doi: 10.1038/s41746-021-00385-9.
Artificial intelligence (AI) models for decision support have been developed for clinical settings such as radiology, but little work evaluates the potential impact of such systems. In this study, physicians received chest X-rays and diagnostic advice, some of which was inaccurate, and were asked to evaluate advice quality and make diagnoses. All advice was generated by human experts, but some was labeled as coming from an AI system. As a group, radiologists rated advice as lower quality when it appeared to come from an AI system; physicians with less task-expertise did not. Diagnostic accuracy was significantly worse when participants received inaccurate advice, regardless of the purported source. This work raises important considerations for how advice, AI and non-AI, should be deployed in clinical environments.
用于决策支持的人工智能(AI)模型已针对放射学等临床环境进行了开发,但很少有研究评估此类系统的潜在影响。在本研究中,医生收到胸部X光片和诊断建议,其中一些建议不准确,并被要求评估建议质量并做出诊断。所有建议均由人类专家给出,但有些被标记为来自人工智能系统。总体而言,当建议似乎来自人工智能系统时,放射科医生将其评为质量较低;而任务专业知识较少的医生则没有这样认为。无论建议的来源如何,当参与者收到不准确的建议时,诊断准确性会显著降低。这项工作引发了关于如何在临床环境中部署人工智能和非人工智能建议的重要思考。