人工智能聊天机器人在基于给定临床病案诊断急性肺血栓栓塞症方面的性能。

Performance of AI-powered chatbots in diagnosing acute pulmonary thromboembolism from given clinical vignettes.

机构信息

M.D., MSc, Department of Emergency Medicine, Republic of Turkey, Ministry of Health, Sisli Hamidiye Etfal Training and Research Hospital, Istanbul, Turkey.

M.D., Department of Emergency Medicine, Republic of Turkey, Ministry of Health, Sisli Hamidiye Etfal Training and Research Hospital, Istanbul, Turkey.

出版信息

Acute Med. 2024;23(2):66-74.

PMID:39132729

Abstract

BACKGROUND

Chatbots hold great potential to serve as support tool in diagnosis and clinical decision process. In this study, we aimed to evaluate the accuracy of chatbots in diagnosing pulmonary embolism (PE). Furthermore, we assessed their performance in determining the PE severity.

METHOD

65 case reports meeting our inclusion criteria were selected for this study. Two emergency medicine (EM) physicians crafted clinical vignettes and introduced them to the Bard, Bing, and ChatGPT-3.5 with asking the top 10 diagnoses. After obtaining all differential diagnoses lists, vignettes enriched with supplemental data redirected to the chatbots with asking the severity of PE.

RESULTS

ChatGPT-3.5, Bing, and Bard listed PE within the top 10 diagnoses list with accuracy rates of 92.3%, 92.3%, and 87.6%, respectively. For the top 3 diagnoses, Bard achieved 75.4% accuracy, while ChatGPT and Bing both had 67.7%. As the top diagnosis, Bard, ChatGPT-3.5, and Bing were accurate in 56.9%, 47.7% and 30.8% cases, respectively. Significant differences between Bard and both Bing (p=0.000) and ChatGPT (p=0.007) were noted in this group. Massive PEs were correctly identified with over 85% success rate. Overclassification rates for Bard, ChatGPT-3.5 and Bing at 38.5%, 23.3% and 20%, respectively. Misclassification rates were highest in submassive group.

CONCLUSION

Although chatbots aren't intended for diagnosis, their high level of diagnostic accuracy and success rate in identifying massive PE underscore the promising potential of chatbots as clinical decision support tool. However, further research with larger patient datasets is required to validate and refine their performance in real-world clinical settings.

摘要

背景

聊天机器人在诊断和临床决策过程中具有很大的潜力，可以作为支持工具。在这项研究中，我们旨在评估聊天机器人在诊断肺栓塞（PE）中的准确性。此外，我们评估了它们在确定 PE 严重程度方面的性能。

方法

选择了符合纳入标准的 65 份病例报告进行这项研究。两名急诊医学（EM）医生编写了临床病例，并向 Bard、Bing 和 ChatGPT-3.5 提出了前 10 种诊断。在获得所有鉴别诊断列表后，补充数据丰富的病例会重新定向到聊天机器人，询问 PE 的严重程度。

结果

ChatGPT-3.5、Bing 和 Bard 将 PE 列入前 10 种诊断列表的准确率分别为 92.3%、92.3%和 87.6%。对于前 3 种诊断，Bard 的准确率为 75.4%，而 ChatGPT 和 Bing 的准确率均为 67.7%。作为首选诊断，Bard、ChatGPT-3.5 和 Bing 的准确率分别为 56.9%、47.7%和 30.8%。在这一组中，Bard 与 Bing（p=0.000）和 ChatGPT（p=0.007）之间存在显著差异。85%以上的成功率正确识别出巨大的 PE。Bard、ChatGPT-3.5 和 Bing 的过度分类率分别为 38.5%、23.3%和 20%。错误分类率在亚大量组中最高。

结论

尽管聊天机器人不是用于诊断的，但它们在识别巨大 PE 方面的高度准确性和成功率突显了聊天机器人作为临床决策支持工具的有前途的潜力。然而，需要进行更多的研究，包括更大的患者数据集，以验证和改进它们在现实临床环境中的性能。

相似文献

Performance of AI-powered chatbots in diagnosing acute pulmonary thromboembolism from given clinical vignettes.人工智能聊天机器人在基于给定临床病案诊断急性肺血栓栓塞症方面的性能。

Acute Med. 2024;23(2):66-74.

Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures.人工智能聊天机器人对改编自患者手册的青光眼问题的回答情况。

Cureus. 2024 Mar 23;16(3):e56766. doi: 10.7759/cureus.56766. eCollection 2024 Mar.

Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools.评估ChatGPT-3.5、ChatGPT-4、必应人工智能和巴德相对于传统药物相互作用临床工具的敏感性、特异性和准确性。

Drug Healthc Patient Saf. 2023 Sep 20;15:137-147. doi: 10.2147/DHPS.S425858. eCollection 2023.

ChatGPT-Generated Differential Diagnosis Lists for Complex Case-Derived Clinical Vignettes: Diagnostic Accuracy Evaluation.基于复杂病例临床案例生成的ChatGPT鉴别诊断列表：诊断准确性评估。

JMIR Med Inform. 2023 Oct 9;11:e48808. doi: 10.2196/48808.

Efficacy of AI Chats to Determine an Emergency: A Comparison Between OpenAI's ChatGPT, Google Bard, and Microsoft Bing AI Chat.人工智能聊天工具在判定紧急情况方面的效能：OpenAI的ChatGPT、谷歌巴德和微软必应人工智能聊天工具的比较

Cureus. 2023 Sep 18;15(9):e45473. doi: 10.7759/cureus.45473. eCollection 2023 Sep.

Quantitative Comparison of Chatbots on Common Rhinology Pathologies.常见鼻科学病症的聊天机器人定量比较。

Laryngoscope. 2024 Oct;134(10):4225-4231. doi: 10.1002/lary.31470. Epub 2024 Apr 26.

Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology.大语言模型（ChatGPT、必应搜索和谷歌巴德）在解决生理学病例 vignettes 中的表现。

Cureus. 2023 Aug 4;15(8):e42972. doi: 10.7759/cureus.42972. eCollection 2023 Aug.

Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions.人类与人工智能：ChatGPT-4在临床化学选择题方面表现优于必应、巴德、ChatGPT-3.5和人类。

Adv Med Educ Pract. 2024 Sep 20;15:857-871. doi: 10.2147/AMEP.S479801. eCollection 2024.

Talking technology: exploring chatbots as a tool for cataract patient education.技术漫谈：探索聊天机器人作为白内障患者教育工具的作用

Clin Exp Optom. 2025 Jan;108(1):56-64. doi: 10.1080/08164622.2023.2298812. Epub 2024 Jan 9.

AI-Powered Renal Diet Support: Performance of ChatGPT, Bard AI, and Bing Chat.人工智能驱动的肾脏饮食支持：ChatGPT、Bard AI和必应聊天的性能

Clin Pract. 2023 Sep 26;13(5):1160-1172. doi: 10.3390/clinpract13050104.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

人工智能聊天机器人在基于给定临床病案诊断急性肺血栓栓塞症方面的性能。

Performance of AI-powered chatbots in diagnosing acute pulmonary thromboembolism from given clinical vignettes.

机构信息

出版信息

BACKGROUND

METHOD

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献