测量人工智能在住院患者诊断中的影响：一项随机临床病例调查研究。

Measuring the Impact of AI in the Diagnosis of Hospitalized Patients: A Randomized Clinical Vignette Survey Study.

机构信息

Computer Science and Engineering, University of Michigan, Ann Arbor.

Now with Computer Science Courant Institute, New York University, New York.

出版信息

JAMA. 2023 Dec 19;330(23):2275-2284. doi: 10.1001/jama.2023.22295.

DOI:10.1001/jama.2023.22295

PMID:38112814

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10731487/

Abstract

IMPORTANCE

Artificial intelligence (AI) could support clinicians when diagnosing hospitalized patients; however, systematic bias in AI models could worsen clinician diagnostic accuracy. Recent regulatory guidance has called for AI models to include explanations to mitigate errors made by models, but the effectiveness of this strategy has not been established.

OBJECTIVES

To evaluate the impact of systematically biased AI on clinician diagnostic accuracy and to determine if image-based AI model explanations can mitigate model errors.

DESIGN, SETTING, AND PARTICIPANTS: Randomized clinical vignette survey study administered between April 2022 and January 2023 across 13 US states involving hospitalist physicians, nurse practitioners, and physician assistants.

INTERVENTIONS

Clinicians were shown 9 clinical vignettes of patients hospitalized with acute respiratory failure, including their presenting symptoms, physical examination, laboratory results, and chest radiographs. Clinicians were then asked to determine the likelihood of pneumonia, heart failure, or chronic obstructive pulmonary disease as the underlying cause(s) of each patient's acute respiratory failure. To establish baseline diagnostic accuracy, clinicians were shown 2 vignettes without AI model input. Clinicians were then randomized to see 6 vignettes with AI model input with or without AI model explanations. Among these 6 vignettes, 3 vignettes included standard-model predictions, and 3 vignettes included systematically biased model predictions.

MAIN OUTCOMES AND MEASURES

Clinician diagnostic accuracy for pneumonia, heart failure, and chronic obstructive pulmonary disease.

RESULTS

Median participant age was 34 years (IQR, 31-39) and 241 (57.7%) were female. Four hundred fifty-seven clinicians were randomized and completed at least 1 vignette, with 231 randomized to AI model predictions without explanations, and 226 randomized to AI model predictions with explanations. Clinicians' baseline diagnostic accuracy was 73.0% (95% CI, 68.3% to 77.8%) for the 3 diagnoses. When shown a standard AI model without explanations, clinician accuracy increased over baseline by 2.9 percentage points (95% CI, 0.5 to 5.2) and by 4.4 percentage points (95% CI, 2.0 to 6.9) when clinicians were also shown AI model explanations. Systematically biased AI model predictions decreased clinician accuracy by 11.3 percentage points (95% CI, 7.2 to 15.5) compared with baseline and providing biased AI model predictions with explanations decreased clinician accuracy by 9.1 percentage points (95% CI, 4.9 to 13.2) compared with baseline, representing a nonsignificant improvement of 2.3 percentage points (95% CI, -2.7 to 7.2) compared with the systematically biased AI model.

CONCLUSIONS AND RELEVANCE

Although standard AI models improve diagnostic accuracy, systematically biased AI models reduced diagnostic accuracy, and commonly used image-based AI model explanations did not mitigate this harmful effect.

TRIAL REGISTRATION

ClinicalTrials.gov Identifier: NCT06098950.

摘要

重要性

人工智能（AI）可以在诊断住院患者时为临床医生提供支持；然而，AI 模型中的系统偏差可能会降低临床医生的诊断准确性。最近的监管指南要求 AI 模型包括解释，以减轻模型错误，但这一策略的有效性尚未得到证实。

目的

评估系统偏差 AI 对临床医生诊断准确性的影响，并确定基于图像的 AI 模型解释是否可以减轻模型错误。

设计、设置和参与者：这是一项在美国 13 个州进行的随机临床病例调查研究，于 2022 年 4 月至 2023 年 1 月期间进行，涉及住院医师、护士执业医师和医师助理。

干预措施

临床医生观看了 9 个患有急性呼吸衰竭的住院患者的临床病例，包括他们的症状、体检、实验室结果和胸部 X 光片。然后，临床医生被要求确定每个患者急性呼吸衰竭的潜在病因（肺炎、心力衰竭或慢性阻塞性肺疾病）。为了建立基线诊断准确性，临床医生观看了 2 个没有 AI 模型输入的病例。然后，临床医生被随机分配观看 6 个有或没有 AI 模型解释的 AI 模型输入病例。在这 6 个病例中，有 3 个病例包含标准模型预测，3 个病例包含系统偏差模型预测。

主要结果和测量

肺炎、心力衰竭和慢性阻塞性肺疾病的临床医生诊断准确性。

结果

中位参与者年龄为 34 岁（IQR，31-39），241 名（57.7%）为女性。共有 457 名临床医生被随机分配并完成了至少 1 个病例，其中 231 名被分配到 AI 模型预测无解释，226 名被分配到 AI 模型预测有解释。临床医生的基线诊断准确性为 73.0%（95%CI，68.3%至 77.8%），用于 3 种诊断。当展示标准 AI 模型而没有解释时，临床医生的准确性相对于基线提高了 2.9 个百分点（95%CI，0.5 至 5.2），当临床医生还观看了 AI 模型解释时，准确性提高了 4.4 个百分点（95%CI，2.0 至 6.9）。与基线相比，系统偏差 AI 模型预测降低了 11.3 个百分点（95%CI，7.2 至 15.5），提供偏差 AI 模型预测和解释降低了 9.1 个百分点（95%CI，4.9 至 13.2），与基线相比，这代表了 2.3 个百分点（95%CI，-2.7 至 7.2）的非显著改善。

结论和相关性

尽管标准 AI 模型提高了诊断准确性，但系统偏差 AI 模型降低了诊断准确性，常用的基于图像的 AI 模型解释并没有减轻这种有害影响。

试验注册

ClinicalTrials.gov 标识符：NCT06098950。

相似文献

Measuring the Impact of AI in the Diagnosis of Hospitalized Patients: A Randomized Clinical Vignette Survey Study.

JAMA. 2023 Dec 19;330(23):2275-2284. doi: 10.1001/jama.2023.22295.

Deep Learning Assistance Closes the Accuracy Gap in Fracture Detection Across Clinician Types.

Clin Orthop Relat Res. 2023 Mar 1;481(3):580-588. doi: 10.1097/CORR.0000000000002385. Epub 2022 Sep 9.

Development and Assessment of an Artificial Intelligence-Based Tool for Skin Condition Diagnosis by Primary Care Physicians and Nurse Practitioners in Teledermatology Practices.

JAMA Netw Open. 2021 Apr 1;4(4):e217249. doi: 10.1001/jamanetworkopen.2021.7249.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Interaction between clinicians and artificial intelligence to detect fetal atrioventricular septal defects on ultrasound: how can we optimize collaborative performance?

Ultrasound Obstet Gynecol. 2024 Jul;64(1):28-35. doi: 10.1002/uog.27577. Epub 2024 Jun 3.

Lung Ultrasound for the Emergency Diagnosis of Pneumonia, Acute Heart Failure, and Exacerbations of Chronic Obstructive Pulmonary Disease/Asthma in Adults: A Systematic Review and Meta-analysis.

J Emerg Med. 2019 Jan;56(1):53-69. doi: 10.1016/j.jemermed.2018.09.009. Epub 2018 Oct 9.

Artificial intelligence suppression as a strategy to mitigate artificial intelligence automation bias.

J Am Med Inform Assoc. 2023 Sep 25;30(10):1684-1692. doi: 10.1093/jamia/ocad118.

Evaluation of the impact of artificial intelligence-assisted image interpretation on the diagnostic performance of clinicians in identifying pneumothoraces on plain chest X-ray: a multi-case multi-reader study.

Emerg Med J. 2024 Sep 25;41(10):602-609. doi: 10.1136/emermed-2023-213620.

Efficacy of Artificial-Intelligence-Driven Differential-Diagnosis List on the Diagnostic Accuracy of Physicians: An Open-Label Randomized Controlled Study.

Int J Environ Res Public Health. 2021 Feb 21;18(4):2086. doi: 10.3390/ijerph18042086.

Preferences for Artificial Intelligence Clinicians Before and During the COVID-19 Pandemic: Discrete Choice Experiment and Propensity Score Matching Study.

J Med Internet Res. 2021 Mar 2;23(3):e26997. doi: 10.2196/26997.

引用本文的文献

The algorithmic consultant: a new era of clinical AI calls for a new workforce of physician-algorithm specialists.

NPJ Digit Med. 2025 Aug 27;8(1):552. doi: 10.1038/s41746-025-01960-0.

From lab to life: technological innovations in transforming cancer metastasis detection and therapy.

Discov Oncol. 2025 Aug 10;16(1):1517. doi: 10.1007/s12672-025-02910-8.

Auditor Models to Suppress Poor AI Predictions Can Improve Human-AI Collaborative Performance.

medRxiv. 2025 Jun 24:2025.06.24.25330212. doi: 10.1101/2025.06.24.25330212.

An Artificial Intelligence Pipeline for Hepatocellular Carcinoma: From Data to Treatment Recommendations.

Int J Gen Med. 2025 Jul 2;18:3581-3595. doi: 10.2147/IJGM.S529322. eCollection 2025.

The Impact of Machine Learning Mortality Risk Prediction on Clinician Prognostic Accuracy and Decision Support: A Randomized Vignette Study.

Med Decis Making. 2025 Jul 4:272989X251349489. doi: 10.1177/0272989X251349489.

DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks.

Comput Vis ECCV. 2025;15122:35-51. doi: 10.1007/978-3-031-73039-9_3. Epub 2024 Oct 31.

Triage-HF Validation in Heart Failure Clinical Practice: Importance of Episode Duration.

Diagnostics (Basel). 2025 Jun 10;15(12):1476. doi: 10.3390/diagnostics15121476.

Empirically derived evaluation requirements for responsible deployments of AI in safety-critical settings.

NPJ Digit Med. 2025 Jun 18;8(1):374. doi: 10.1038/s41746-025-01784-y.

A scoping review and evidence gap analysis of clinical AI fairness.

NPJ Digit Med. 2025 Jun 14;8(1):360. doi: 10.1038/s41746-025-01667-2.

Artificial intelligence for age-related macular degeneration diagnosis in Australia: A Novel Qualitative Interview Study.

Ophthalmic Physiol Opt. 2025 Sep;45(6):1282-1292. doi: 10.1111/opo.13542. Epub 2025 Jun 14.

本文引用的文献

Prevention of Bias and Discrimination in Clinical Practice Algorithms.

JAMA. 2023 Jan 24;329(4):283-284. doi: 10.1001/jama.2022.23867.

Teaching artificial intelligence as a fundamental toolset of medicine.

Cell Rep Med. 2022 Dec 20;3(12):100824. doi: 10.1016/j.xcrm.2022.100824.

Practice Trends and Characteristics of US Hospitalists From 2012 to 2018.

JAMA Health Forum. 2021 Nov 5;2(11):e213524. doi: 10.1001/jamahealthforum.2021.3524. eCollection 2021 Nov.

AI recognition of patient race in medical imaging: a modelling study.

Lancet Digit Health. 2022 Jun;4(6):e406-e414. doi: 10.1016/S2589-7500(22)00063-2. Epub 2022 May 11.

Combining chest X-rays and electronic health record (EHR) data using machine learning to diagnose acute respiratory failure.

J Am Med Inform Assoc. 2022 May 11;29(6):1060-1068. doi: 10.1093/jamia/ocac030.

Deep learning in histopathology: the path to the clinic.

Nat Med. 2021 May;27(5):775-784. doi: 10.1038/s41591-021-01343-4. Epub 2021 May 14.

Do as AI say: susceptibility in deployment of clinical decision-aids.

NPJ Digit Med. 2021 Feb 19;4(1):31. doi: 10.1038/s41746-021-00385-9.

The Epidemiology of Respiratory Failure in the United States 2002-2017: A Serial Cross-Sectional Study.

Crit Care Explor. 2020 Jun 10;2(6):e0128. doi: 10.1097/CCE.0000000000000128. eCollection 2020 Jun.

Human-computer collaboration for skin cancer recognition.

Nat Med. 2020 Aug;26(8):1229-1234. doi: 10.1038/s41591-020-0942-0. Epub 2020 Jun 22.

Presenting machine learning model information to clinical end users with model facts labels.

NPJ Digit Med. 2020 Mar 23;3:41. doi: 10.1038/s41746-020-0253-3. eCollection 2020.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

测量人工智能在住院患者诊断中的影响：一项随机临床病例调查研究。

Measuring the Impact of AI in the Diagnosis of Hospitalized Patients: A Randomized Clinical Vignette Survey Study.

机构信息

出版信息

IMPORTANCE

OBJECTIVES

INTERVENTIONS

MAIN OUTCOMES AND MEASURES

RESULTS

CONCLUSIONS AND RELEVANCE

TRIAL REGISTRATION

重要性

目的

干预措施

主要结果和测量

结果

结论和相关性

试验注册

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献