新型GPT-4在耳鼻喉科知识评估中的表现

Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment.

作者信息

Revercomb Lucy, Patel Aman M, Fu Daniel, Filimonov Andrey

机构信息

Department of Otolaryngology - Head and Neck Surgery, Rutgers New Jersey Medical School, 185 S Orange Ave Newark, Newark, NJ 07103 USA.

出版信息

Indian J Otolaryngol Head Neck Surg. 2024 Dec;76(6):6112-6114. doi: 10.1007/s12070-024-04935-x. Epub 2024 Aug 3.

DOI:10.1007/s12070-024-04935-x

PMID:39559040

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11569072/

Abstract

PURPOSE

GPT-4, recently released by OpenAI, improves upon GPT-3.5 with increased reliability and expanded capabilities, including user-specified, customizable GPT-4 models. This study aims to investigate updates in GPT-4 performance vs. GPT-3.5 on Otolaryngology board-style questions.

METHODS

150 Otolaryngology board-style questions were obtained from the BoardVitals question bank. These questions, which were previously assessed with GPT-3.5, were inputted into standard GPT-4 and a custom GPT-4 model designed to specialize in Otolaryngology board-style questions, emphasize precision, and provide evidence-based explanations.

RESULTS

Standard GPT-4 correctly answered 72.0% and custom GPT-4 correctly answered 81.3% of the questions, vs. GPT-3.5 which answered 51.3% of the same questions correctly. On multivariable analysis, custom GPT-4 had higher odds of correctly answering questions than standard GPT-4 (adjusted odds ratio 2.19, = 0.015). Both GPT-4 and custom GPT-4 demonstrated a decrease in performance between questions rated as easy and hard ( < 0.001).

CONCLUSIONS

Our study suggests that GPT-4 has higher accuracy than GPT-3.5 in answering Otolaryngology board-style questions. Our custom GPT-4 model demonstrated higher accuracy than standard GPT-4, potentially as a result of its instructions to specialize in Otolaryngology board-style questions, select exactly one answer, and emphasize precision. This demonstrates custom models may further enhance utilization of ChatGPT in medical education.

摘要

目的

OpenAI最近发布的GPT-4在GPT-3.5的基础上进行了改进，具有更高的可靠性和扩展功能，包括用户指定的可定制GPT-4模型。本研究旨在调查GPT-4与GPT-3.5在耳鼻喉科板题型问题上的性能更新情况。

方法

从BoardVitals题库中获取了150道耳鼻喉科板题型问题。这些之前用GPT-3.5评估过的问题被输入到标准GPT-4和一个专门设计用于耳鼻喉科板题型问题、强调精确性并提供循证解释的定制GPT-4模型中。

结果

标准GPT-4正确回答了72.0%的问题，定制GPT-4正确回答了81.3%的问题，而GPT-3.5正确回答了相同问题的51.3%。在多变量分析中，定制GPT-4正确回答问题的几率高于标准GPT-4（调整后的优势比为2.19，=0.015）。GPT-4和定制GPT-4在被评为简单和困难的问题之间的性能均有所下降（<0.001）。

结论

我们的研究表明，在回答耳鼻喉科板题型问题方面，GPT-4比GPT-3.5具有更高的准确性。我们的定制GPT-4模型表现出比标准GPT-4更高的准确性，这可能是由于其针对耳鼻喉科板题型问题进行专门设计、精确选择唯一答案并强调精确性的指令所致。这表明定制模型可能会进一步提高ChatGPT在医学教育中的利用率。

相似文献

Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment.新型GPT-4在耳鼻喉科知识评估中的表现

Indian J Otolaryngol Head Neck Surg. 2024 Dec;76(6):6112-6114. doi: 10.1007/s12070-024-04935-x. Epub 2024 Aug 3.

Advancing Medical Education: Performance of Generative Artificial Intelligence Models on Otolaryngology Board Preparation Questions With Image Analysis Insights.推进医学教育：生成式人工智能模型在耳鼻喉科委员会备考问题上的表现及图像分析见解

Cureus. 2024 Jul 9;16(7):e64204. doi: 10.7759/cureus.64204. eCollection 2024 Jul.

Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study.GPT-4V 在回答日本耳鼻喉科学委员会认证考试问题方面的表现：评估研究。

JMIR Med Educ. 2024 Mar 28;10:e57054. doi: 10.2196/57054.

Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study.ChatGPT 在不同考试级别的眼科相关问题上的表现：观察性研究。

JMIR Med Educ. 2024 Jan 18;10:e50842. doi: 10.2196/50842.

Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations.ChatGPT和GPT-4在神经外科笔试中的表现。

Neurosurgery. 2023 Dec 1;93(6):1353-1365. doi: 10.1227/neu.0000000000002632. Epub 2023 Aug 15.

Large Language Models Take on Cardiothoracic Surgery: A Comparative Analysis of the Performance of Four Models on American Board of Thoracic Surgery Exam Questions in 2023.大语言模型应用于心胸外科手术：2023年四种模型在美国胸外科医师委员会考试题目上的性能对比分析

Cureus. 2024 Jul 22;16(7):e65083. doi: 10.7759/cureus.65083. eCollection 2024 Jul.

Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank.ChatGPT、GPT-4和谷歌巴德在神经外科口试准备题库上的表现。

Neurosurgery. 2023 Nov 1;93(5):1090-1098. doi: 10.1227/neu.0000000000002551. Epub 2023 Jun 12.

Performance of trauma-trained large language models on surgical assessment questions: A new approach in resource identification.经过创伤培训的大语言模型在外科评估问题上的表现：资源识别的一种新方法。

Surgery. 2025 Mar;179:108793. doi: 10.1016/j.surg.2024.08.026. Epub 2024 Sep 23.

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.人工智能的快速发展：GPT-4 在骨科手术委员会问题上的表现。

Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.

GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.GPT-4人工智能模型在类似神经外科书面考试的问题上表现优于ChatGPT、医学生和神经外科住院医师。

World Neurosurg. 2023 Nov;179:e160-e165. doi: 10.1016/j.wneu.2023.08.042. Epub 2023 Aug 18.

本文引用的文献

Assessment of Artificial Intelligence Performance on the Otolaryngology Residency In-Service Exam.人工智能在耳鼻咽喉科住院医师在职考试中的表现评估。

OTO Open. 2023 Nov 29;7(4):e98. doi: 10.1002/oto2.98. eCollection 2023 Oct-Dec.

Performance of ChatGPT in Otolaryngology knowledge assessment.ChatGPT在耳鼻喉科知识评估中的表现。

Am J Otolaryngol. 2024 Jan-Feb;45(1):104082. doi: 10.1016/j.amjoto.2023.104082. Epub 2023 Oct 14.

Applying GPT-4 to the Plastic Surgery Inservice Training Examination.将 GPT-4 应用于整形外科住院医师培训考试。

J Plast Reconstr Aesthet Surg. 2023 Dec;87:78-82. doi: 10.1016/j.bjps.2023.09.027. Epub 2023 Sep 14.

Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.比较 ChatGPT 和 GPT-4 在 USMLE 软技能评估中的表现。

Sci Rep. 2023 Oct 1;13(1):16492. doi: 10.1038/s41598-023-43436-9.

Implications of Using Chatbots for Future Surgical Education.使用聊天机器人对未来外科教育的影响。

JAMA Surg. 2023 Nov 1;158(11):1220-1222. doi: 10.1001/jamasurg.2023.3875.

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现：使用大语言模型进行人工智能辅助医学教育的潜力。

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

Artificial Intelligence in Undergraduate Medical Education: A Scoping Review.人工智能在本科医学教育中的应用：范围综述。

Acad Med. 2021 Nov 1;96(11S):S62-S70. doi: 10.1097/ACM.0000000000004291.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验