文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

Advancing Medical Education: Performance of Generative Artificial Intelligence Models on Otolaryngology Board Preparation Questions With Image Analysis Insights.

作者信息

Terwilliger Emma, Bcharah George, Bcharah Hend, Bcharah Estefana, Richardson Clare, Scheffler Patrick

机构信息

Otolaryngology, Mayo Clinic Alix School of Medicine, Scottsdale, USA.

Otolaryngology, Andrew Taylor Still University School of Osteopathic Medicine, Mesa, USA.

出版信息

Cureus. 2024 Jul 9;16(7):e64204. doi: 10.7759/cureus.64204. eCollection 2024 Jul.


DOI:10.7759/cureus.64204
PMID:39130878
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11315421/
Abstract

Objective  To evaluate and compare the performance of Chat Generative Pre-Trained Transformer (ChatGPT), GPT-4, and Google Bard on United States otolaryngology board-style questions to scale their ability to act as an adjunctive study tool and resource for students and doctors. Methods A 1077 text question and 60 image-based questions from the otolaryngology board exam preparation tool BoardVitals were inputted into ChatGPT, GPT-4, and Google Bard. The questions were scaled true or false, depending on whether the artificial intelligence (AI) modality provided the correct response. Data analysis was performed in R Studio. Results  GPT-4 scored the highest at 78.7% compared to ChatGPT and Bard at 55.3% and 61.7% (p<0.001), respectively. In terms of question difficulty, all three AI models performed best on easy questions (ChatGPT: 69.7%, GPT-4: 92.5%, and Bard: 76.4%) and worst on hard questions (ChatGPT: 42.3%, GPT-4: 61.3%, and Bard: 45.6%). Across all difficulty levels, GPT-4 did better than Bard and ChatGPT (p<0.0001). GPT-4 outperformed ChatGPT and Bard in all subspecialty sections, with significantly higher scores (p<0.05) on all sections except allergy (p>0.05). On image-based questions, GPT-4 performed better than Bard (56.7% vs 46.4%, p=0.368) and had better overall image interpretation capabilities. Conclusion This study showed that the GPT-4 model performed better than both ChatGPT and Bard on the United States otolaryngology board practice questions. Although the GPT-4 results were promising, AI should still be used with caution when being implemented in medical education or patient care settings.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29f6/11315421/5febbf557de4/cureus-0016-00000064204-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29f6/11315421/39462ec6e760/cureus-0016-00000064204-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29f6/11315421/5febbf557de4/cureus-0016-00000064204-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29f6/11315421/39462ec6e760/cureus-0016-00000064204-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29f6/11315421/5febbf557de4/cureus-0016-00000064204-i02.jpg

相似文献

[1]
Advancing Medical Education: Performance of Generative Artificial Intelligence Models on Otolaryngology Board Preparation Questions With Image Analysis Insights.

Cureus. 2024-7-9

[2]
Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions-an observational study.

Int Orthop. 2024-8

[3]
Performance evaluation of ChatGPT, GPT-4, and Bard on the official board examination of the Japan Radiology Society.

Jpn J Radiol. 2024-2

[4]
GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.

World Neurosurg. 2023-11

[5]
Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank.

Neurosurgery. 2023-11-1

[6]
Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard.

JMIR Med Educ. 2024-2-21

[7]
Comparing the Performance of Popular Large Language Models on the National Board of Medical Examiners Sample Questions.

Cureus. 2024-3-11

[8]
Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions.

Adv Med Educ Pract. 2024-9-20

[9]
Generative Artificial Intelligence Performs at a Second-Year Orthopedic Resident Level.

Cureus. 2024-3-13

[10]
Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal.

Clin Exp Nephrol. 2024-5

引用本文的文献

[1]
Applications of Natural Language Processing in Otolaryngology: A Scoping Review.

Laryngoscope. 2025-9

[2]
Evaluating the Accuracy, Reliability, Consistency, and Readability of Different Large Language Models in Restorative Dentistry.

J Esthet Restor Dent. 2025-7

[3]
Advancements in AI Medical Education: Assessing ChatGPT's Performance on USMLE-Style Questions Across Topics and Difficulty Levels.

Cureus. 2024-12-24

[4]
Response to: comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o: correspondence.

Clin Rheumatol. 2024-12

本文引用的文献

[1]
Feasibility of Multimodal Artificial Intelligence Using GPT-4 Vision for the Classification of Middle Ear Disease: Qualitative Study and Validation.

JMIR AI. 2024-5-31

[2]
Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study.

JMIR Med Educ. 2024-3-28

[3]
Physician views of artificial intelligence in otolaryngology and rhinology: A mixed methods study.

Laryngoscope Investig Otolaryngol. 2023-10-31

[4]
Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination.

Sci Rep. 2023-11-22

[5]
Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination.

JB JS Open Access. 2023-9-8

[6]
Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment.

Can Assoc Radiol J. 2024-5

[7]
Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.

JMIR Med Educ. 2023-6-29

[8]
Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank.

Neurosurgery. 2023-11-1

[9]
ChatGPT's quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions.

Eur Arch Otorhinolaryngol. 2023-9

[10]
ChatGPT takes on the European Exam in Core Cardiology: an artificial intelligence success story?

Eur Heart J Digit Health. 2023-4-24

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索