• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

促进人工智能驱动的诊断决策支持系统中信任校准以确定医生的诊断准确性:准实验研究。

Facilitating Trust Calibration in Artificial Intelligence-Driven Diagnostic Decision Support Systems for Determining Physicians' Diagnostic Accuracy: Quasi-Experimental Study.

机构信息

Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Mibu-cho, Shimotsuga-gun, Tochigi, 321-0293, Japan, 81 282-86-1111, 81 282-86-4775.

出版信息

JMIR Form Res. 2024 Nov 27;8:e58666. doi: 10.2196/58666.

DOI:10.2196/58666
PMID:39602469
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11612524/
Abstract

BACKGROUND

Diagnostic errors are significant problems in medical care. Despite the usefulness of artificial intelligence (AI)-based diagnostic decision support systems, the overreliance of physicians on AI-generated diagnoses may lead to diagnostic errors.

OBJECTIVE

We investigated the safe use of AI-based diagnostic decision support systems with trust calibration by adjusting trust levels to match the actual reliability of AI.

METHODS

A quasi-experimental study was conducted at Dokkyo Medical University, Japan, with physicians allocated (1:1) to the intervention and control groups. A total of 20 clinical cases were created based on the medical histories recorded by an AI-driven automated medical history-taking system from actual patients who visited a community-based hospital in Japan. The participants reviewed the medical histories of 20 clinical cases generated by an AI-driven automated medical history-taking system with an AI-generated list of 10 differential diagnoses and provided 1 to 3 possible diagnoses. Physicians were asked whether the final diagnosis was in the AI-generated list of 10 differential diagnoses in the intervention group, which served as the trust calibration. We analyzed the diagnostic accuracy of physicians and the correctness of the trust calibration in the intervention group. We also investigated the relationship between the accuracy of the trust calibration and the diagnostic accuracy of physicians, and the physicians' confidence level regarding the use of AI.

RESULTS

Among the 20 physicians assigned to the intervention (n=10) and control (n=10) groups, the mean age was 30.9 (SD 3.9) years and 31.7 (SD 4.2) years, the proportion of men was 80% and 60%, and the mean postgraduate year was 5.8 (SD 2.9) and 7.2 (SD 4.6), respectively, with no significant differences. The physicians' diagnostic accuracy was 41.5% in the intervention group and 46% in the control group, with no significant difference (95% CI -0.75 to 2.55; P=.27). The overall accuracy of the trust calibration was only 61.5%, and despite correct calibration, the diagnostic accuracy was 54.5%. In the multivariate logistic regression model, the accuracy of the trust calibration was a significant contributor to the diagnostic accuracy of physicians (adjusted odds ratio 5.90, 95% CI 2.93-12.46; P<.001). The mean confidence level for AI was 72.5% in the intervention group and 45% in the control group, with no significant difference.

CONCLUSIONS

Trust calibration did not significantly improve physicians' diagnostic accuracy when considering the differential diagnoses generated by reading medical histories and the possible differential diagnosis lists of an AI-driven automated medical history-taking system. As this was a formative study, the small sample size and suboptimal trust calibration methods may have contributed to the lack of significant differences. This study highlights the need for a larger sample size and the implementation of supportive measures of trust calibration.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f04b/11612524/943ecc676962/formative-v8-e58666-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f04b/11612524/943ecc676962/formative-v8-e58666-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f04b/11612524/943ecc676962/formative-v8-e58666-g001.jpg
摘要

背景

诊断错误是医疗保健中的一个重大问题。尽管基于人工智能(AI)的诊断决策支持系统具有一定的作用,但医生过度依赖 AI 生成的诊断结果可能会导致诊断错误。

目的

我们通过调整信任水平以匹配 AI 的实际可靠性,研究了基于人工智能的诊断决策支持系统的安全使用方法,即信任校准。

方法

在日本的独协医科大学进行了一项准实验研究,将医生(1:1)分配到干预组和对照组。根据 AI 驱动的自动病史采集系统从日本一家社区医院的实际患者记录的病史,共创建了 20 个临床病例。参与者查看了由 AI 驱动的自动病史采集系统生成的 20 个临床病例的病史,并根据 AI 生成的 10 个鉴别诊断列表提供了 1 到 3 个可能的诊断。在干预组中,医生需要回答最终诊断是否在 AI 生成的 10 个鉴别诊断列表中,这就是信任校准。我们分析了医生的诊断准确性和干预组中信任校准的正确性。我们还研究了信任校准的准确性与医生的诊断准确性之间的关系,以及医生对 AI 使用的信心水平。

结果

在被分配到干预组(n=10)和对照组(n=10)的 20 名医生中,平均年龄分别为 30.9(SD 3.9)岁和 31.7(SD 4.2)岁,男性比例分别为 80%和 60%,平均研究生年限分别为 5.8(SD 2.9)年和 7.2(SD 4.6)年,差异均无统计学意义。干预组的医生诊断准确性为 41.5%,对照组为 46%,差异无统计学意义(95%CI-0.75 至 2.55;P=.27)。信任校准的总体准确性仅为 61.5%,尽管校准正确,但诊断准确性仍为 54.5%。在多变量逻辑回归模型中,信任校准的准确性是医生诊断准确性的一个显著因素(调整优势比 5.90,95%CI 2.93-12.46;P<.001)。干预组中医生对 AI 的平均信心水平为 72.5%,对照组为 45%,差异无统计学意义。

结论

在考虑读取病史和 AI 驱动的自动病史采集系统的可能鉴别诊断列表所生成的鉴别诊断时,信任校准并未显著提高医生的诊断准确性。由于这是一项形成性研究,样本量较小且信任校准方法欠佳,可能导致差异无统计学意义。本研究强调了需要更大的样本量和实施支持信任校准的措施。

相似文献

1
Facilitating Trust Calibration in Artificial Intelligence-Driven Diagnostic Decision Support Systems for Determining Physicians' Diagnostic Accuracy: Quasi-Experimental Study.促进人工智能驱动的诊断决策支持系统中信任校准以确定医生的诊断准确性:准实验研究。
JMIR Form Res. 2024 Nov 27;8:e58666. doi: 10.2196/58666.
2
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
3
Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。
Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.
4
The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study.GPT-3 人工智能模型的诊断和分诊准确性:一项观察性研究。
Lancet Digit Health. 2024 Aug;6(8):e555-e561. doi: 10.1016/S2589-7500(24)00097-9.
5
Artificial intelligence for detecting keratoconus.人工智能在圆锥角膜检测中的应用。
Cochrane Database Syst Rev. 2023 Nov 15;11(11):CD014911. doi: 10.1002/14651858.CD014911.pub2.
6
Sertindole for schizophrenia.用于治疗精神分裂症的舍吲哚。
Cochrane Database Syst Rev. 2005 Jul 20;2005(3):CD001715. doi: 10.1002/14651858.CD001715.pub2.
7
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
8
Systemic treatments for metastatic cutaneous melanoma.转移性皮肤黑色素瘤的全身治疗
Cochrane Database Syst Rev. 2018 Feb 6;2(2):CD011123. doi: 10.1002/14651858.CD011123.pub2.
9
Interventions for supporting pregnant women's decision-making about mode of birth after a caesarean.支持剖宫产术后孕妇做出分娩方式决策的干预措施。
Cochrane Database Syst Rev. 2013 Jul 30;2013(7):CD010041. doi: 10.1002/14651858.CD010041.pub2.
10
Shared decision-making for people with asthma.哮喘患者的共同决策
Cochrane Database Syst Rev. 2017 Oct 3;10(10):CD012330. doi: 10.1002/14651858.CD012330.pub2.

引用本文的文献

1
Development and verification of a convolutional neural network-based model for automatic mandibular canal localization on multicenter CBCT images.基于卷积神经网络的多中心CBCT图像下颌管自动定位模型的开发与验证
BMC Oral Health. 2025 Aug 21;25(1):1352. doi: 10.1186/s12903-025-06724-6.

本文引用的文献

1
Automation Bias and Assistive AI: Risk of Harm From AI-Driven Clinical Decision Support.自动化偏差与辅助性人工智能:人工智能驱动的临床决策支持带来的伤害风险
JAMA. 2023 Dec 19;330(23):2255-2257. doi: 10.1001/jama.2023.22557.
2
Measuring the Impact of AI in the Diagnosis of Hospitalized Patients: A Randomized Clinical Vignette Survey Study.测量人工智能在住院患者诊断中的影响:一项随机临床病例调查研究。
JAMA. 2023 Dec 19;330(23):2275-2284. doi: 10.1001/jama.2023.22295.
3
Prevalence of atypical presentations among outpatients and associations with diagnostic error.
门诊患者中不典型表现的患病率及其与诊断错误的关系。
Diagnosis (Berl). 2023 Dec 8;11(1):40-48. doi: 10.1515/dx-2023-0060. eCollection 2024 Feb 1.
4
Improving decision accuracy using a clinical decision support system for medical students during history-taking: a randomized clinical trial.利用临床决策支持系统提高医学生问诊决策准确性的随机临床试验。
BMC Med Educ. 2023 May 25;23(1):383. doi: 10.1186/s12909-023-04370-6.
5
Incidence of Diagnostic Errors Among Unexpectedly Hospitalized Patients Using an Automated Medical History-Taking System With a Differential Diagnosis Generator: Retrospective Observational Study.使用带有鉴别诊断生成器的自动病史采集系统的意外住院患者中诊断错误的发生率:回顾性观察研究。
JMIR Med Inform. 2022 Jan 27;10(1):e35225. doi: 10.2196/35225.
6
Efficacy of Artificial-Intelligence-Driven Differential-Diagnosis List on the Diagnostic Accuracy of Physicians: An Open-Label Randomized Controlled Study.人工智能驱动的鉴别诊断清单对医生诊断准确性的影响:一项开放标签随机对照研究。
Int J Environ Res Public Health. 2021 Feb 21;18(4):2086. doi: 10.3390/ijerph18042086.
7
Impact of a Commercial Artificial Intelligence-Driven Patient Self-Assessment Solution on Waiting Times at General Internal Medicine Outpatient Departments: Retrospective Study.商业人工智能驱动的患者自我评估解决方案对普通内科门诊候诊时间的影响:回顾性研究
JMIR Med Inform. 2020 Aug 31;8(8):e21056. doi: 10.2196/21056.
8
Multimorbidity and patient-reported diagnostic errors in the primary care setting: multicentre cross-sectional study in Japan.多病共存与初级保健环境中的患者报告诊断错误:日本多中心横断面研究。
BMJ Open. 2020 Aug 20;10(8):e039040. doi: 10.1136/bmjopen-2020-039040.
9
Factors and impact of physicians' diagnostic errors in malpractice claims in Japan.日本医疗事故诉讼中医师诊断错误的因素和影响。
PLoS One. 2020 Aug 3;15(8):e0237145. doi: 10.1371/journal.pone.0237145. eCollection 2020.
10
Adaptive trust calibration for human-AI collaboration.自适应信任校准在人机协作中的应用。
PLoS One. 2020 Feb 21;15(2):e0229132. doi: 10.1371/journal.pone.0229132. eCollection 2020.