• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估成人鼻窦炎指南:美国耳鼻咽喉头颈外科学会(AAO-HNS)与人工智能聊天机器人的比较分析

Assessing adult sinusitis guidelines: A comparative analysis of AAO-HNS and AI Chatbots.

作者信息

Edalati Shaun, Sharma Shiven, Guda Rahul, Vasan Vikram, Mohamed Shahed, Gidumal Sunder, Govindaraj Satish, Iloreta Alfred Marc

机构信息

Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

出版信息

Am J Otolaryngol. 2025 Jan-Feb;46(1):104563. doi: 10.1016/j.amjoto.2024.104563. Epub 2025 Jan 29.

DOI:10.1016/j.amjoto.2024.104563
PMID:39884919
Abstract

OBJECTIVE

To compare the guidelines offered by the American Academy of Otolaryngology-Head and Neck Surgery Foundation (AAO-HNS) on adult sinusitis to chatbots.

METHODS

ChatGPT-3.5, ChatGPT-4.0, Bard, and Llama 2 represent openly accessible large language model-based chatbots. Accuracy, over-conclusiveness, supplemental, and incompleteness of chatbot responses were compared to the AAO-HNS Adult sinusitis clinical guidelines.

RESULTS

12 guidelines consisting of 30 questions from the AAO-HNS were compared to 4 different chatbots. Adherence to AAO-HNS guidelines varied, with Llama 2 providing 80 % accurate responses, BARD 83.3 %, ChatGPT-4.0 80 %, and ChatGPT-3.5 73.3 %. Over-conclusive responses were minimal, with only one instance each from Llama 2 and ChatGPT-4.0. However, rates of incomplete responses varied, with Llama 2 exhibiting the highest at 40 %, followed by ChatGPT-4.0 at 33.3 %, BARD at 23.3 %, and ChatGPT-3.5 at 36.7 %. Fisher's Exact Test analysis revealed significant deviations from the guideline standard, with less accuracy (p = 0.012 for Llama 2, p = 0.026 for BARD, p = 0.012 for ChatGPT-4.0, p = 0.002 for ChatGPT-3.5), inclusion of supplemental data (p < 0.001 for all), and less completeness (p < 0.01 for all) across all chatbots, indicating potential areas for enhancement in their performance.

CONCLUSION

Although AI chatbots like Llama 2, Bard, and ChatGPT exhibit potential in sharing health-related information, their present performance in responding to clinical concerns concerning adult rhinosinusitis is not up to par with recognized clinical criteria. Future revisions should focus on addressing these shortcomings and placing an emphasis on accuracy, completeness, and conformity with evidence-based practices.

摘要

目的

比较美国耳鼻咽喉-头颈外科学会基金会(AAO-HNS)提供的成人鼻窦炎指南与聊天机器人的情况。

方法

ChatGPT-3.5、ChatGPT-4.0、Bard和Llama 2代表基于公开可用的大语言模型的聊天机器人。将聊天机器人回复的准确性、过度结论性、补充性和不完整性与AAO-HNS成人鼻窦炎临床指南进行比较。

结果

将由AAO-HNS的30个问题组成的12条指南与4种不同的聊天机器人进行了比较。各聊天机器人对AAO-HNS指南的遵循情况各不相同,Llama 2的回答准确率为80%,BARD为83.3%,ChatGPT-4.0为80%,ChatGPT-3.5为73.3%。过度结论性的回复很少,Llama 2和ChatGPT-4.0各只有一个实例。然而,不完整回复的比例各不相同,Llama 2最高,为40%,其次是ChatGPT-4.0,为33.3%,BARD为23.3%,ChatGPT-3.5为36.7%。Fisher精确检验分析显示,所有聊天机器人在准确性(Llama 2的p = 0.012,BARD的p = 0.026,ChatGPT-4.0的p = 0.012,ChatGPT-3.5的p = 0.002)、补充数据的纳入(所有p < 0.001)和完整性(所有p < 0.01)方面均与指南标准存在显著偏差,表明它们在性能上有潜在的改进空间。

结论

尽管像Llama 2、Bard和ChatGPT这样的人工智能聊天机器人在分享健康相关信息方面具有潜力,但它们目前在回答有关成人鼻-鼻窦炎的临床问题时的表现未达到公认的临床标准。未来的修订应侧重于解决这些缺点,并强调准确性、完整性以及与循证实践的一致性。

相似文献

1
Assessing adult sinusitis guidelines: A comparative analysis of AAO-HNS and AI Chatbots.评估成人鼻窦炎指南:美国耳鼻咽喉头颈外科学会(AAO-HNS)与人工智能聊天机器人的比较分析
Am J Otolaryngol. 2025 Jan-Feb;46(1):104563. doi: 10.1016/j.amjoto.2024.104563. Epub 2025 Jan 29.
2
Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures.人工智能聊天机器人对改编自患者手册的青光眼问题的回答情况。
Cureus. 2024 Mar 23;16(3):e56766. doi: 10.7759/cureus.56766. eCollection 2024 Mar.
3
Talking technology: exploring chatbots as a tool for cataract patient education.技术漫谈:探索聊天机器人作为白内障患者教育工具的作用
Clin Exp Optom. 2025 Jan;108(1):56-64. doi: 10.1080/08164622.2023.2298812. Epub 2024 Jan 9.
4
Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.评估药物流产信息的准确性:ChatGPT与谷歌巴德人工智能的比较分析
Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan.
5
Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer Lu-PSMA-617 therapy.ChatGPT-4和Bard聊天机器人在回答关于前列腺癌Lu-PSMA-617疗法常见患者问题方面的表现
Front Oncol. 2024 Jul 12;14:1386718. doi: 10.3389/fonc.2024.1386718. eCollection 2024.
6
A study of adherence to the AAO-HNS "Clinical Practice Guideline: Adult Sinusitis".一项关于对美国耳鼻咽喉头颈外科学会(AAO-HNS)《临床实践指南:成人鼻窦炎》依从性的研究。
Ear Nose Throat J. 2014 Aug;93(8):338-52. doi: 10.1177/014556131409300813.
7
Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.生成式人工智能聊天机器人可能会为患者关于常见血管外科问题提供恰当的信息性回复。
Vascular. 2025 Feb;33(1):229-237. doi: 10.1177/17085381241240550. Epub 2024 Mar 18.
8
Accuracy of Prospective Assessments of 4 Large Language Model Chatbot Responses to Patient Questions About Emergency Care: Experimental Comparative Study.前瞻性评估 4 种大型语言模型聊天机器人对患者关于急救护理问题的回答的准确性:实验性对比研究。
J Med Internet Res. 2024 Nov 4;26:e60291. doi: 10.2196/60291.
9
The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries.人工智能聊天机器人大型语言模型在解决骨骼生物学和骨骼健康问题方面的表现。
J Bone Miner Res. 2024 Mar 22;39(2):106-115. doi: 10.1093/jbmr/zjad007.
10
Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard.生成式人工智能聊天机器人针对分娩硬膜外麻醉常见问题的可读性、质量和准确性:ChatGPT与Bard的比较
Int J Obstet Anesth. 2025 Feb;61:104317. doi: 10.1016/j.ijoa.2024.104317. Epub 2024 Dec 20.