文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

促进人工智能驱动的诊断决策支持系统中信任校准以确定医生的诊断准确性:准实验研究。

Facilitating Trust Calibration in Artificial Intelligence-Driven Diagnostic Decision Support Systems for Determining Physicians' Diagnostic Accuracy: Quasi-Experimental Study.

机构信息

Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Mibu-cho, Shimotsuga-gun, Tochigi, 321-0293, Japan, 81 282-86-1111, 81 282-86-4775.

出版信息

JMIR Form Res. 2024 Nov 27;8:e58666. doi: 10.2196/58666.


DOI:10.2196/58666
PMID:39602469
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11612524/
Abstract

BACKGROUND: Diagnostic errors are significant problems in medical care. Despite the usefulness of artificial intelligence (AI)-based diagnostic decision support systems, the overreliance of physicians on AI-generated diagnoses may lead to diagnostic errors. OBJECTIVE: We investigated the safe use of AI-based diagnostic decision support systems with trust calibration by adjusting trust levels to match the actual reliability of AI. METHODS: A quasi-experimental study was conducted at Dokkyo Medical University, Japan, with physicians allocated (1:1) to the intervention and control groups. A total of 20 clinical cases were created based on the medical histories recorded by an AI-driven automated medical history-taking system from actual patients who visited a community-based hospital in Japan. The participants reviewed the medical histories of 20 clinical cases generated by an AI-driven automated medical history-taking system with an AI-generated list of 10 differential diagnoses and provided 1 to 3 possible diagnoses. Physicians were asked whether the final diagnosis was in the AI-generated list of 10 differential diagnoses in the intervention group, which served as the trust calibration. We analyzed the diagnostic accuracy of physicians and the correctness of the trust calibration in the intervention group. We also investigated the relationship between the accuracy of the trust calibration and the diagnostic accuracy of physicians, and the physicians' confidence level regarding the use of AI. RESULTS: Among the 20 physicians assigned to the intervention (n=10) and control (n=10) groups, the mean age was 30.9 (SD 3.9) years and 31.7 (SD 4.2) years, the proportion of men was 80% and 60%, and the mean postgraduate year was 5.8 (SD 2.9) and 7.2 (SD 4.6), respectively, with no significant differences. The physicians' diagnostic accuracy was 41.5% in the intervention group and 46% in the control group, with no significant difference (95% CI -0.75 to 2.55; P=.27). The overall accuracy of the trust calibration was only 61.5%, and despite correct calibration, the diagnostic accuracy was 54.5%. In the multivariate logistic regression model, the accuracy of the trust calibration was a significant contributor to the diagnostic accuracy of physicians (adjusted odds ratio 5.90, 95% CI 2.93-12.46; P<.001). The mean confidence level for AI was 72.5% in the intervention group and 45% in the control group, with no significant difference. CONCLUSIONS: Trust calibration did not significantly improve physicians' diagnostic accuracy when considering the differential diagnoses generated by reading medical histories and the possible differential diagnosis lists of an AI-driven automated medical history-taking system. As this was a formative study, the small sample size and suboptimal trust calibration methods may have contributed to the lack of significant differences. This study highlights the need for a larger sample size and the implementation of supportive measures of trust calibration.

摘要

背景:诊断错误是医疗保健中的一个重大问题。尽管基于人工智能(AI)的诊断决策支持系统具有一定的作用,但医生过度依赖 AI 生成的诊断结果可能会导致诊断错误。

目的:我们通过调整信任水平以匹配 AI 的实际可靠性,研究了基于人工智能的诊断决策支持系统的安全使用方法,即信任校准。

方法:在日本的独协医科大学进行了一项准实验研究,将医生(1:1)分配到干预组和对照组。根据 AI 驱动的自动病史采集系统从日本一家社区医院的实际患者记录的病史,共创建了 20 个临床病例。参与者查看了由 AI 驱动的自动病史采集系统生成的 20 个临床病例的病史,并根据 AI 生成的 10 个鉴别诊断列表提供了 1 到 3 个可能的诊断。在干预组中,医生需要回答最终诊断是否在 AI 生成的 10 个鉴别诊断列表中,这就是信任校准。我们分析了医生的诊断准确性和干预组中信任校准的正确性。我们还研究了信任校准的准确性与医生的诊断准确性之间的关系,以及医生对 AI 使用的信心水平。

结果:在被分配到干预组(n=10)和对照组(n=10)的 20 名医生中,平均年龄分别为 30.9(SD 3.9)岁和 31.7(SD 4.2)岁,男性比例分别为 80%和 60%,平均研究生年限分别为 5.8(SD 2.9)年和 7.2(SD 4.6)年,差异均无统计学意义。干预组的医生诊断准确性为 41.5%,对照组为 46%,差异无统计学意义(95%CI-0.75 至 2.55;P=.27)。信任校准的总体准确性仅为 61.5%,尽管校准正确,但诊断准确性仍为 54.5%。在多变量逻辑回归模型中,信任校准的准确性是医生诊断准确性的一个显著因素(调整优势比 5.90,95%CI 2.93-12.46;P<.001)。干预组中医生对 AI 的平均信心水平为 72.5%,对照组为 45%,差异无统计学意义。

结论:在考虑读取病史和 AI 驱动的自动病史采集系统的可能鉴别诊断列表所生成的鉴别诊断时,信任校准并未显著提高医生的诊断准确性。由于这是一项形成性研究,样本量较小且信任校准方法欠佳,可能导致差异无统计学意义。本研究强调了需要更大的样本量和实施支持信任校准的措施。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f04b/11612524/943ecc676962/formative-v8-e58666-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f04b/11612524/943ecc676962/formative-v8-e58666-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f04b/11612524/943ecc676962/formative-v8-e58666-g001.jpg

相似文献

[1]
Facilitating Trust Calibration in Artificial Intelligence-Driven Diagnostic Decision Support Systems for Determining Physicians' Diagnostic Accuracy: Quasi-Experimental Study.

JMIR Form Res. 2024-11-27

[2]
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024-12-1

[3]
Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.

Cochrane Database Syst Rev. 2008-7-16

[4]
The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study.

Lancet Digit Health. 2024-8

[5]
Artificial intelligence for detecting keratoconus.

Cochrane Database Syst Rev. 2023-11-15

[6]
Sertindole for schizophrenia.

Cochrane Database Syst Rev. 2005-7-20

[7]
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022-5-20

[8]
Systemic treatments for metastatic cutaneous melanoma.

Cochrane Database Syst Rev. 2018-2-6

[9]
Interventions for supporting pregnant women's decision-making about mode of birth after a caesarean.

Cochrane Database Syst Rev. 2013-7-30

[10]
Shared decision-making for people with asthma.

Cochrane Database Syst Rev. 2017-10-3

引用本文的文献

[1]
Development and verification of a convolutional neural network-based model for automatic mandibular canal localization on multicenter CBCT images.

BMC Oral Health. 2025-8-21

本文引用的文献

[1]
Automation Bias and Assistive AI: Risk of Harm From AI-Driven Clinical Decision Support.

JAMA. 2023-12-19

[2]
Measuring the Impact of AI in the Diagnosis of Hospitalized Patients: A Randomized Clinical Vignette Survey Study.

JAMA. 2023-12-19

[3]
Prevalence of atypical presentations among outpatients and associations with diagnostic error.

Diagnosis (Berl). 2024-2-1

[4]
Improving decision accuracy using a clinical decision support system for medical students during history-taking: a randomized clinical trial.

BMC Med Educ. 2023-5-25

[5]
Incidence of Diagnostic Errors Among Unexpectedly Hospitalized Patients Using an Automated Medical History-Taking System With a Differential Diagnosis Generator: Retrospective Observational Study.

JMIR Med Inform. 2022-1-27

[6]
Efficacy of Artificial-Intelligence-Driven Differential-Diagnosis List on the Diagnostic Accuracy of Physicians: An Open-Label Randomized Controlled Study.

Int J Environ Res Public Health. 2021-2-21

[7]
Impact of a Commercial Artificial Intelligence-Driven Patient Self-Assessment Solution on Waiting Times at General Internal Medicine Outpatient Departments: Retrospective Study.

JMIR Med Inform. 2020-8-31

[8]
Multimorbidity and patient-reported diagnostic errors in the primary care setting: multicentre cross-sectional study in Japan.

BMJ Open. 2020-8-20

[9]
Factors and impact of physicians' diagnostic errors in malpractice claims in Japan.

PLoS One. 2020-8-3

[10]
Adaptive trust calibration for human-AI collaboration.

PLoS One. 2020-2-21

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索