Suppr超能文献

ChatGPT 对耳鼻喉科临床实践指南的遵循情况。

ChatGPT's adherence to otolaryngology clinical practice guidelines.

机构信息

Department of Otolaryngology and Head and Neck Surgery, Sheba Medical Center, Ramat Gan, Israel.

School of Medicine, Tel Aviv University, Tel Aviv, Israel.

出版信息

Eur Arch Otorhinolaryngol. 2024 Jul;281(7):3829-3834. doi: 10.1007/s00405-024-08634-9. Epub 2024 Apr 22.

Abstract

OBJECTIVES

Large language models, including ChatGPT, has the potential to transform the way we approach medical knowledge, yet accuracy in clinical topics is critical. Here we assessed ChatGPT's performance in adhering to the American Academy of Otolaryngology-Head and Neck Surgery guidelines.

METHODS

We presented ChatGPT with 24 clinical otolaryngology questions based on the guidelines of the American Academy of Otolaryngology. This was done three times (N = 72) to test the model's consistency. Two otolaryngologists evaluated the responses for accuracy and relevance to the guidelines. Cohen's Kappa was used to measure evaluator agreement, and Cronbach's alpha assessed the consistency of ChatGPT's responses.

RESULTS

The study revealed mixed results; 59.7% (43/72) of ChatGPT's responses were highly accurate, while only 2.8% (2/72) directly contradicted the guidelines. The model showed 100% accuracy in Head and Neck, but lower accuracy in Rhinology and Otology/Neurotology (66%), Laryngology (50%), and Pediatrics (8%). The model's responses were consistent in 17/24 (70.8%), with a Cronbach's alpha value of 0.87, indicating a reasonable consistency across tests.

CONCLUSIONS

Using a guideline-based set of structured questions, ChatGPT demonstrates consistency but variable accuracy in otolaryngology. Its lower performance in some areas, especially Pediatrics, suggests that further rigorous evaluation is needed before considering real-world clinical use.

摘要

目的

大型语言模型,包括 ChatGPT,有可能改变我们对待医学知识的方式,但在临床主题上的准确性是至关重要的。在这里,我们评估了 ChatGPT 遵守美国耳鼻喉科学会-头颈外科学会指南的能力。

方法

我们向 ChatGPT 提出了 24 个基于美国耳鼻喉科学会指南的临床耳鼻喉科问题。这是三次进行的(N=72),以测试模型的一致性。两名耳鼻喉科医生评估了回答的准确性和与指南的相关性。使用 Cohen's Kappa 来衡量评估者的一致性,Cronbach's alpha 评估 ChatGPT 回答的一致性。

结果

研究结果喜忧参半;ChatGPT 59.7%(43/72)的回答非常准确,而只有 2.8%(2/72)直接与指南相悖。该模型在头颈部显示出 100%的准确率,但在鼻科学、耳科学/神经耳科学(66%)、喉科学(50%)和儿科学(8%)的准确率较低。该模型在 24 个问题中的 17 个(70.8%)回答是一致的,Cronbach's alpha 值为 0.87,表明在测试之间有一定的一致性。

结论

使用基于指南的一套结构化问题,ChatGPT 在耳鼻喉科方面表现出了一致性,但准确性不同。它在某些领域,特别是儿科学方面的表现较差,这表明在考虑实际临床应用之前,还需要进行更严格的评估。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验