Computer Science and Engineering, University of Michigan, Ann Arbor.
Now with Computer Science Courant Institute, New York University, New York.
JAMA. 2023 Dec 19;330(23):2275-2284. doi: 10.1001/jama.2023.22295.
IMPORTANCE: Artificial intelligence (AI) could support clinicians when diagnosing hospitalized patients; however, systematic bias in AI models could worsen clinician diagnostic accuracy. Recent regulatory guidance has called for AI models to include explanations to mitigate errors made by models, but the effectiveness of this strategy has not been established. OBJECTIVES: To evaluate the impact of systematically biased AI on clinician diagnostic accuracy and to determine if image-based AI model explanations can mitigate model errors. DESIGN, SETTING, AND PARTICIPANTS: Randomized clinical vignette survey study administered between April 2022 and January 2023 across 13 US states involving hospitalist physicians, nurse practitioners, and physician assistants. INTERVENTIONS: Clinicians were shown 9 clinical vignettes of patients hospitalized with acute respiratory failure, including their presenting symptoms, physical examination, laboratory results, and chest radiographs. Clinicians were then asked to determine the likelihood of pneumonia, heart failure, or chronic obstructive pulmonary disease as the underlying cause(s) of each patient's acute respiratory failure. To establish baseline diagnostic accuracy, clinicians were shown 2 vignettes without AI model input. Clinicians were then randomized to see 6 vignettes with AI model input with or without AI model explanations. Among these 6 vignettes, 3 vignettes included standard-model predictions, and 3 vignettes included systematically biased model predictions. MAIN OUTCOMES AND MEASURES: Clinician diagnostic accuracy for pneumonia, heart failure, and chronic obstructive pulmonary disease. RESULTS: Median participant age was 34 years (IQR, 31-39) and 241 (57.7%) were female. Four hundred fifty-seven clinicians were randomized and completed at least 1 vignette, with 231 randomized to AI model predictions without explanations, and 226 randomized to AI model predictions with explanations. Clinicians' baseline diagnostic accuracy was 73.0% (95% CI, 68.3% to 77.8%) for the 3 diagnoses. When shown a standard AI model without explanations, clinician accuracy increased over baseline by 2.9 percentage points (95% CI, 0.5 to 5.2) and by 4.4 percentage points (95% CI, 2.0 to 6.9) when clinicians were also shown AI model explanations. Systematically biased AI model predictions decreased clinician accuracy by 11.3 percentage points (95% CI, 7.2 to 15.5) compared with baseline and providing biased AI model predictions with explanations decreased clinician accuracy by 9.1 percentage points (95% CI, 4.9 to 13.2) compared with baseline, representing a nonsignificant improvement of 2.3 percentage points (95% CI, -2.7 to 7.2) compared with the systematically biased AI model. CONCLUSIONS AND RELEVANCE: Although standard AI models improve diagnostic accuracy, systematically biased AI models reduced diagnostic accuracy, and commonly used image-based AI model explanations did not mitigate this harmful effect. TRIAL REGISTRATION: ClinicalTrials.gov Identifier: NCT06098950.
重要性:人工智能(AI)可以在诊断住院患者时为临床医生提供支持;然而,AI 模型中的系统偏差可能会降低临床医生的诊断准确性。最近的监管指南要求 AI 模型包括解释,以减轻模型错误,但这一策略的有效性尚未得到证实。
目的:评估系统偏差 AI 对临床医生诊断准确性的影响,并确定基于图像的 AI 模型解释是否可以减轻模型错误。
设计、设置和参与者:这是一项在美国 13 个州进行的随机临床病例调查研究,于 2022 年 4 月至 2023 年 1 月期间进行,涉及住院医师、护士执业医师和医师助理。
干预措施:临床医生观看了 9 个患有急性呼吸衰竭的住院患者的临床病例,包括他们的症状、体检、实验室结果和胸部 X 光片。然后,临床医生被要求确定每个患者急性呼吸衰竭的潜在病因(肺炎、心力衰竭或慢性阻塞性肺疾病)。为了建立基线诊断准确性,临床医生观看了 2 个没有 AI 模型输入的病例。然后,临床医生被随机分配观看 6 个有或没有 AI 模型解释的 AI 模型输入病例。在这 6 个病例中,有 3 个病例包含标准模型预测,3 个病例包含系统偏差模型预测。
主要结果和测量:肺炎、心力衰竭和慢性阻塞性肺疾病的临床医生诊断准确性。
结果:中位参与者年龄为 34 岁(IQR,31-39),241 名(57.7%)为女性。共有 457 名临床医生被随机分配并完成了至少 1 个病例,其中 231 名被分配到 AI 模型预测无解释,226 名被分配到 AI 模型预测有解释。临床医生的基线诊断准确性为 73.0%(95%CI,68.3%至 77.8%),用于 3 种诊断。当展示标准 AI 模型而没有解释时,临床医生的准确性相对于基线提高了 2.9 个百分点(95%CI,0.5 至 5.2),当临床医生还观看了 AI 模型解释时,准确性提高了 4.4 个百分点(95%CI,2.0 至 6.9)。与基线相比,系统偏差 AI 模型预测降低了 11.3 个百分点(95%CI,7.2 至 15.5),提供偏差 AI 模型预测和解释降低了 9.1 个百分点(95%CI,4.9 至 13.2),与基线相比,这代表了 2.3 个百分点(95%CI,-2.7 至 7.2)的非显著改善。
结论和相关性:尽管标准 AI 模型提高了诊断准确性,但系统偏差 AI 模型降低了诊断准确性,常用的基于图像的 AI 模型解释并没有减轻这种有害影响。
试验注册:ClinicalTrials.gov 标识符:NCT06098950。
Clin Orthop Relat Res. 2023-3-1
Cochrane Database Syst Rev. 2022-2-1
J Am Med Inform Assoc. 2023-9-25
Int J Environ Res Public Health. 2021-2-21
Comput Vis ECCV. 2025
Diagnostics (Basel). 2025-6-10
NPJ Digit Med. 2025-6-14
Cell Rep Med. 2022-12-20
JAMA Health Forum. 2021-11
Lancet Digit Health. 2022-6
J Am Med Inform Assoc. 2022-5-11
Nat Med. 2021-5
NPJ Digit Med. 2021-2-19
Crit Care Explor. 2020-6-10
Nat Med. 2020-6-22
NPJ Digit Med. 2020-3-23