Suppr超能文献

测量人工智能在住院患者诊断中的影响:一项随机临床病例调查研究。

Measuring the Impact of AI in the Diagnosis of Hospitalized Patients: A Randomized Clinical Vignette Survey Study.

机构信息

Computer Science and Engineering, University of Michigan, Ann Arbor.

Now with Computer Science Courant Institute, New York University, New York.

出版信息

JAMA. 2023 Dec 19;330(23):2275-2284. doi: 10.1001/jama.2023.22295.

Abstract

IMPORTANCE

Artificial intelligence (AI) could support clinicians when diagnosing hospitalized patients; however, systematic bias in AI models could worsen clinician diagnostic accuracy. Recent regulatory guidance has called for AI models to include explanations to mitigate errors made by models, but the effectiveness of this strategy has not been established.

OBJECTIVES

To evaluate the impact of systematically biased AI on clinician diagnostic accuracy and to determine if image-based AI model explanations can mitigate model errors.

DESIGN, SETTING, AND PARTICIPANTS: Randomized clinical vignette survey study administered between April 2022 and January 2023 across 13 US states involving hospitalist physicians, nurse practitioners, and physician assistants.

INTERVENTIONS

Clinicians were shown 9 clinical vignettes of patients hospitalized with acute respiratory failure, including their presenting symptoms, physical examination, laboratory results, and chest radiographs. Clinicians were then asked to determine the likelihood of pneumonia, heart failure, or chronic obstructive pulmonary disease as the underlying cause(s) of each patient's acute respiratory failure. To establish baseline diagnostic accuracy, clinicians were shown 2 vignettes without AI model input. Clinicians were then randomized to see 6 vignettes with AI model input with or without AI model explanations. Among these 6 vignettes, 3 vignettes included standard-model predictions, and 3 vignettes included systematically biased model predictions.

MAIN OUTCOMES AND MEASURES

Clinician diagnostic accuracy for pneumonia, heart failure, and chronic obstructive pulmonary disease.

RESULTS

Median participant age was 34 years (IQR, 31-39) and 241 (57.7%) were female. Four hundred fifty-seven clinicians were randomized and completed at least 1 vignette, with 231 randomized to AI model predictions without explanations, and 226 randomized to AI model predictions with explanations. Clinicians' baseline diagnostic accuracy was 73.0% (95% CI, 68.3% to 77.8%) for the 3 diagnoses. When shown a standard AI model without explanations, clinician accuracy increased over baseline by 2.9 percentage points (95% CI, 0.5 to 5.2) and by 4.4 percentage points (95% CI, 2.0 to 6.9) when clinicians were also shown AI model explanations. Systematically biased AI model predictions decreased clinician accuracy by 11.3 percentage points (95% CI, 7.2 to 15.5) compared with baseline and providing biased AI model predictions with explanations decreased clinician accuracy by 9.1 percentage points (95% CI, 4.9 to 13.2) compared with baseline, representing a nonsignificant improvement of 2.3 percentage points (95% CI, -2.7 to 7.2) compared with the systematically biased AI model.

CONCLUSIONS AND RELEVANCE

Although standard AI models improve diagnostic accuracy, systematically biased AI models reduced diagnostic accuracy, and commonly used image-based AI model explanations did not mitigate this harmful effect.

TRIAL REGISTRATION

ClinicalTrials.gov Identifier: NCT06098950.

摘要

重要性

人工智能(AI)可以在诊断住院患者时为临床医生提供支持;然而,AI 模型中的系统偏差可能会降低临床医生的诊断准确性。最近的监管指南要求 AI 模型包括解释,以减轻模型错误,但这一策略的有效性尚未得到证实。

目的

评估系统偏差 AI 对临床医生诊断准确性的影响,并确定基于图像的 AI 模型解释是否可以减轻模型错误。

设计、设置和参与者:这是一项在美国 13 个州进行的随机临床病例调查研究,于 2022 年 4 月至 2023 年 1 月期间进行,涉及住院医师、护士执业医师和医师助理。

干预措施

临床医生观看了 9 个患有急性呼吸衰竭的住院患者的临床病例,包括他们的症状、体检、实验室结果和胸部 X 光片。然后,临床医生被要求确定每个患者急性呼吸衰竭的潜在病因(肺炎、心力衰竭或慢性阻塞性肺疾病)。为了建立基线诊断准确性,临床医生观看了 2 个没有 AI 模型输入的病例。然后,临床医生被随机分配观看 6 个有或没有 AI 模型解释的 AI 模型输入病例。在这 6 个病例中,有 3 个病例包含标准模型预测,3 个病例包含系统偏差模型预测。

主要结果和测量

肺炎、心力衰竭和慢性阻塞性肺疾病的临床医生诊断准确性。

结果

中位参与者年龄为 34 岁(IQR,31-39),241 名(57.7%)为女性。共有 457 名临床医生被随机分配并完成了至少 1 个病例,其中 231 名被分配到 AI 模型预测无解释,226 名被分配到 AI 模型预测有解释。临床医生的基线诊断准确性为 73.0%(95%CI,68.3%至 77.8%),用于 3 种诊断。当展示标准 AI 模型而没有解释时,临床医生的准确性相对于基线提高了 2.9 个百分点(95%CI,0.5 至 5.2),当临床医生还观看了 AI 模型解释时,准确性提高了 4.4 个百分点(95%CI,2.0 至 6.9)。与基线相比,系统偏差 AI 模型预测降低了 11.3 个百分点(95%CI,7.2 至 15.5),提供偏差 AI 模型预测和解释降低了 9.1 个百分点(95%CI,4.9 至 13.2),与基线相比,这代表了 2.3 个百分点(95%CI,-2.7 至 7.2)的非显著改善。

结论和相关性

尽管标准 AI 模型提高了诊断准确性,但系统偏差 AI 模型降低了诊断准确性,常用的基于图像的 AI 模型解释并没有减轻这种有害影响。

试验注册

ClinicalTrials.gov 标识符:NCT06098950。

相似文献

2
Deep Learning Assistance Closes the Accuracy Gap in Fracture Detection Across Clinician Types.
Clin Orthop Relat Res. 2023 Mar 1;481(3):580-588. doi: 10.1097/CORR.0000000000002385. Epub 2022 Sep 9.
7
Artificial intelligence suppression as a strategy to mitigate artificial intelligence automation bias.
J Am Med Inform Assoc. 2023 Sep 25;30(10):1684-1692. doi: 10.1093/jamia/ocad118.

引用本文的文献

2
From lab to life: technological innovations in transforming cancer metastasis detection and therapy.
Discov Oncol. 2025 Aug 10;16(1):1517. doi: 10.1007/s12672-025-02910-8.
3
Auditor Models to Suppress Poor AI Predictions Can Improve Human-AI Collaborative Performance.
medRxiv. 2025 Jun 24:2025.06.24.25330212. doi: 10.1101/2025.06.24.25330212.
4
An Artificial Intelligence Pipeline for Hepatocellular Carcinoma: From Data to Treatment Recommendations.
Int J Gen Med. 2025 Jul 2;18:3581-3595. doi: 10.2147/IJGM.S529322. eCollection 2025.
6
DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks.
Comput Vis ECCV. 2025;15122:35-51. doi: 10.1007/978-3-031-73039-9_3. Epub 2024 Oct 31.
7
Triage-HF Validation in Heart Failure Clinical Practice: Importance of Episode Duration.
Diagnostics (Basel). 2025 Jun 10;15(12):1476. doi: 10.3390/diagnostics15121476.
9
A scoping review and evidence gap analysis of clinical AI fairness.
NPJ Digit Med. 2025 Jun 14;8(1):360. doi: 10.1038/s41746-025-01667-2.
10
Artificial intelligence for age-related macular degeneration diagnosis in Australia: A Novel Qualitative Interview Study.
Ophthalmic Physiol Opt. 2025 Sep;45(6):1282-1292. doi: 10.1111/opo.13542. Epub 2025 Jun 14.

本文引用的文献

1
Prevention of Bias and Discrimination in Clinical Practice Algorithms.
JAMA. 2023 Jan 24;329(4):283-284. doi: 10.1001/jama.2022.23867.
2
Teaching artificial intelligence as a fundamental toolset of medicine.
Cell Rep Med. 2022 Dec 20;3(12):100824. doi: 10.1016/j.xcrm.2022.100824.
3
Practice Trends and Characteristics of US Hospitalists From 2012 to 2018.
JAMA Health Forum. 2021 Nov 5;2(11):e213524. doi: 10.1001/jamahealthforum.2021.3524. eCollection 2021 Nov.
4
AI recognition of patient race in medical imaging: a modelling study.
Lancet Digit Health. 2022 Jun;4(6):e406-e414. doi: 10.1016/S2589-7500(22)00063-2. Epub 2022 May 11.
6
Deep learning in histopathology: the path to the clinic.
Nat Med. 2021 May;27(5):775-784. doi: 10.1038/s41591-021-01343-4. Epub 2021 May 14.
7
Do as AI say: susceptibility in deployment of clinical decision-aids.
NPJ Digit Med. 2021 Feb 19;4(1):31. doi: 10.1038/s41746-021-00385-9.
8
The Epidemiology of Respiratory Failure in the United States 2002-2017: A Serial Cross-Sectional Study.
Crit Care Explor. 2020 Jun 10;2(6):e0128. doi: 10.1097/CCE.0000000000000128. eCollection 2020 Jun.
9
Human-computer collaboration for skin cancer recognition.
Nat Med. 2020 Aug;26(8):1229-1234. doi: 10.1038/s41591-020-0942-0. Epub 2020 Jun 22.
10
Presenting machine learning model information to clinical end users with model facts labels.
NPJ Digit Med. 2020 Mar 23;3:41. doi: 10.1038/s41746-020-0253-3. eCollection 2020.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验