Suppr超能文献

人工智能聊天机器人在基于给定临床病案诊断急性肺血栓栓塞症方面的性能。

Performance of AI-powered chatbots in diagnosing acute pulmonary thromboembolism from given clinical vignettes.

机构信息

M.D., MSc, Department of Emergency Medicine, Republic of Turkey, Ministry of Health, Sisli Hamidiye Etfal Training and Research Hospital, Istanbul, Turkey.

M.D., Department of Emergency Medicine, Republic of Turkey, Ministry of Health, Sisli Hamidiye Etfal Training and Research Hospital, Istanbul, Turkey.

出版信息

Acute Med. 2024;23(2):66-74.

Abstract

BACKGROUND

Chatbots hold great potential to serve as support tool in diagnosis and clinical decision process. In this study, we aimed to evaluate the accuracy of chatbots in diagnosing pulmonary embolism (PE). Furthermore, we assessed their performance in determining the PE severity.

METHOD

65 case reports meeting our inclusion criteria were selected for this study. Two emergency medicine (EM) physicians crafted clinical vignettes and introduced them to the Bard, Bing, and ChatGPT-3.5 with asking the top 10 diagnoses. After obtaining all differential diagnoses lists, vignettes enriched with supplemental data redirected to the chatbots with asking the severity of PE.

RESULTS

ChatGPT-3.5, Bing, and Bard listed PE within the top 10 diagnoses list with accuracy rates of 92.3%, 92.3%, and 87.6%, respectively. For the top 3 diagnoses, Bard achieved 75.4% accuracy, while ChatGPT and Bing both had 67.7%. As the top diagnosis, Bard, ChatGPT-3.5, and Bing were accurate in 56.9%, 47.7% and 30.8% cases, respectively. Significant differences between Bard and both Bing (p=0.000) and ChatGPT (p=0.007) were noted in this group. Massive PEs were correctly identified with over 85% success rate. Overclassification rates for Bard, ChatGPT-3.5 and Bing at 38.5%, 23.3% and 20%, respectively. Misclassification rates were highest in submassive group.

CONCLUSION

Although chatbots aren't intended for diagnosis, their high level of diagnostic accuracy and success rate in identifying massive PE underscore the promising potential of chatbots as clinical decision support tool. However, further research with larger patient datasets is required to validate and refine their performance in real-world clinical settings.

摘要

背景

聊天机器人在诊断和临床决策过程中具有很大的潜力,可以作为支持工具。在这项研究中,我们旨在评估聊天机器人在诊断肺栓塞(PE)中的准确性。此外,我们评估了它们在确定 PE 严重程度方面的性能。

方法

选择了符合纳入标准的 65 份病例报告进行这项研究。两名急诊医学(EM)医生编写了临床病例,并向 Bard、Bing 和 ChatGPT-3.5 提出了前 10 种诊断。在获得所有鉴别诊断列表后,补充数据丰富的病例会重新定向到聊天机器人,询问 PE 的严重程度。

结果

ChatGPT-3.5、Bing 和 Bard 将 PE 列入前 10 种诊断列表的准确率分别为 92.3%、92.3%和 87.6%。对于前 3 种诊断,Bard 的准确率为 75.4%,而 ChatGPT 和 Bing 的准确率均为 67.7%。作为首选诊断,Bard、ChatGPT-3.5 和 Bing 的准确率分别为 56.9%、47.7%和 30.8%。在这一组中,Bard 与 Bing(p=0.000)和 ChatGPT(p=0.007)之间存在显著差异。85%以上的成功率正确识别出巨大的 PE。Bard、ChatGPT-3.5 和 Bing 的过度分类率分别为 38.5%、23.3%和 20%。错误分类率在亚大量组中最高。

结论

尽管聊天机器人不是用于诊断的,但它们在识别巨大 PE 方面的高度准确性和成功率突显了聊天机器人作为临床决策支持工具的有前途的潜力。然而,需要进行更多的研究,包括更大的患者数据集,以验证和改进它们在现实临床环境中的性能。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验