Suppr超能文献

ChatGPT 在检测诊断错误及其促成因素方面的性能评估:545 例诊断错误案例报告分析。

Performance evaluation of ChatGPT in detecting diagnostic errors and their contributing factors: an analysis of 545 case reports of diagnostic errors.

机构信息

Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga-gun, Tochigi, Japan

Urasoe General Hospital, Urasoe, Okinawa, Japan.

出版信息

BMJ Open Qual. 2024 Jun 3;13(2):e002654. doi: 10.1136/bmjoq-2023-002654.

Abstract

BACKGROUND

Manual chart review using validated assessment tools is a standardised methodology for detecting diagnostic errors. However, this requires considerable human resources and time. ChatGPT, a recently developed artificial intelligence chatbot based on a large language model, can effectively classify text based on suitable prompts. Therefore, ChatGPT can assist manual chart reviews in detecting diagnostic errors.

OBJECTIVE

This study aimed to clarify whether ChatGPT could correctly detect diagnostic errors and possible factors contributing to them based on case presentations.

METHODS

We analysed 545 published case reports that included diagnostic errors. We imputed the texts of case presentations and the final diagnoses with some original prompts into ChatGPT (GPT-4) to generate responses, including the judgement of diagnostic errors and contributing factors of diagnostic errors. Factors contributing to diagnostic errors were coded according to the following three taxonomies: Diagnosis Error Evaluation and Research (DEER), Reliable Diagnosis Challenges (RDC) and Generic Diagnostic Pitfalls (GDP). The responses on the contributing factors from ChatGPT were compared with those from physicians.

RESULTS

ChatGPT correctly detected diagnostic errors in 519/545 cases (95%) and coded statistically larger numbers of factors contributing to diagnostic errors per case than physicians: DEER (median 5 vs 1, p<0.001), RDC (median 4 vs 2, p<0.001) and GDP (median 4 vs 1, p<0.001). The most important contributing factors of diagnostic errors coded by ChatGPT were 'failure/delay in considering the diagnosis' (315, 57.8%) in DEER, 'atypical presentation' (365, 67.0%) in RDC, and 'atypical presentation' (264, 48.4%) in GDP.

CONCLUSION

ChatGPT accurately detects diagnostic errors from case presentations. ChatGPT may be more sensitive than manual reviewing in detecting factors contributing to diagnostic errors, especially for 'atypical presentation'.

摘要

背景

使用经过验证的评估工具进行手动图表审查是检测诊断错误的标准化方法。然而,这需要大量的人力资源和时间。ChatGPT 是一种最近开发的基于大型语言模型的人工智能聊天机器人,它可以根据合适的提示有效地对文本进行分类。因此,ChatGPT 可以协助手动图表审查检测诊断错误。

目的

本研究旨在阐明 ChatGPT 是否可以根据病例报告正确检测诊断错误以及可能导致这些错误的因素。

方法

我们分析了 545 篇包含诊断错误的已发表病例报告。我们将病例报告和最终诊断的文本与一些原始提示一起输入 ChatGPT(GPT-4)以生成响应,包括诊断错误的判断和诊断错误的促成因素。根据以下三个分类法对促成诊断错误的因素进行编码:诊断错误评估与研究 (DEER)、可靠诊断挑战 (RDC) 和通用诊断陷阱 (GDP)。ChatGPT 对促成因素的回复与医生的回复进行了比较。

结果

ChatGPT 正确检测到 545 例中的 519 例(95%)诊断错误,并对每个病例编码的促成诊断错误的因素数量统计上大于医生:DEER(中位数 5 比 1,p<0.001)、RDC(中位数 4 比 2,p<0.001)和 GDP(中位数 4 比 1,p<0.001)。ChatGPT 编码的诊断错误最重要的促成因素是 DEER 中的“未能/延迟考虑诊断”(315 例,57.8%)、RDC 中的“非典型表现”(365 例,67.0%)和 GDP 中的“非典型表现”(264 例,48.4%)。

结论

ChatGPT 可以准确地从病例报告中检测诊断错误。ChatGPT 在检测诊断错误的促成因素方面可能比手动审查更敏感,尤其是对于“非典型表现”。

相似文献

引用本文的文献

1
The application of ChatGPT in nursing: a bibliometric and visualized analysis.ChatGPT在护理中的应用:文献计量学与可视化分析
Front Med (Lausanne). 2024 Dec 18;11:1521712. doi: 10.3389/fmed.2024.1521712. eCollection 2024.

本文引用的文献

9
Utility of ChatGPT in Clinical Practice.ChatGPT 在临床实践中的应用。
J Med Internet Res. 2023 Jun 28;25:e48568. doi: 10.2196/48568.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验