Suppr超能文献

新型GPT-4在耳鼻喉科知识评估中的表现

Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment.

作者信息

Revercomb Lucy, Patel Aman M, Fu Daniel, Filimonov Andrey

机构信息

Department of Otolaryngology - Head and Neck Surgery, Rutgers New Jersey Medical School, 185 S Orange Ave Newark, Newark, NJ 07103 USA.

出版信息

Indian J Otolaryngol Head Neck Surg. 2024 Dec;76(6):6112-6114. doi: 10.1007/s12070-024-04935-x. Epub 2024 Aug 3.

Abstract

PURPOSE

GPT-4, recently released by OpenAI, improves upon GPT-3.5 with increased reliability and expanded capabilities, including user-specified, customizable GPT-4 models. This study aims to investigate updates in GPT-4 performance vs. GPT-3.5 on Otolaryngology board-style questions.

METHODS

150 Otolaryngology board-style questions were obtained from the BoardVitals question bank. These questions, which were previously assessed with GPT-3.5, were inputted into standard GPT-4 and a custom GPT-4 model designed to specialize in Otolaryngology board-style questions, emphasize precision, and provide evidence-based explanations.

RESULTS

Standard GPT-4 correctly answered 72.0% and custom GPT-4 correctly answered 81.3% of the questions, vs. GPT-3.5 which answered 51.3% of the same questions correctly. On multivariable analysis, custom GPT-4 had higher odds of correctly answering questions than standard GPT-4 (adjusted odds ratio 2.19,  = 0.015). Both GPT-4 and custom GPT-4 demonstrated a decrease in performance between questions rated as easy and hard ( < 0.001).

CONCLUSIONS

Our study suggests that GPT-4 has higher accuracy than GPT-3.5 in answering Otolaryngology board-style questions. Our custom GPT-4 model demonstrated higher accuracy than standard GPT-4, potentially as a result of its instructions to specialize in Otolaryngology board-style questions, select exactly one answer, and emphasize precision. This demonstrates custom models may further enhance utilization of ChatGPT in medical education.

摘要

目的

OpenAI最近发布的GPT-4在GPT-3.5的基础上进行了改进,具有更高的可靠性和扩展功能,包括用户指定的可定制GPT-4模型。本研究旨在调查GPT-4与GPT-3.5在耳鼻喉科板题型问题上的性能更新情况。

方法

从BoardVitals题库中获取了150道耳鼻喉科板题型问题。这些之前用GPT-3.5评估过的问题被输入到标准GPT-4和一个专门设计用于耳鼻喉科板题型问题、强调精确性并提供循证解释的定制GPT-4模型中。

结果

标准GPT-4正确回答了72.0%的问题,定制GPT-4正确回答了81.3%的问题,而GPT-3.5正确回答了相同问题的51.3%。在多变量分析中,定制GPT-4正确回答问题的几率高于标准GPT-4(调整后的优势比为2.19,=0.015)。GPT-4和定制GPT-4在被评为简单和困难的问题之间的性能均有所下降(<0.001)。

结论

我们的研究表明,在回答耳鼻喉科板题型问题方面,GPT-4比GPT-3.5具有更高的准确性。我们的定制GPT-4模型表现出比标准GPT-4更高的准确性,这可能是由于其针对耳鼻喉科板题型问题进行专门设计、精确选择唯一答案并强调精确性的指令所致。这表明定制模型可能会进一步提高ChatGPT在医学教育中的利用率。

相似文献

1
Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment.新型GPT-4在耳鼻喉科知识评估中的表现
Indian J Otolaryngol Head Neck Surg. 2024 Dec;76(6):6112-6114. doi: 10.1007/s12070-024-04935-x. Epub 2024 Aug 3.
5
Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations.ChatGPT和GPT-4在神经外科笔试中的表现。
Neurosurgery. 2023 Dec 1;93(6):1353-1365. doi: 10.1227/neu.0000000000002632. Epub 2023 Aug 15.

本文引用的文献

2
Performance of ChatGPT in Otolaryngology knowledge assessment.ChatGPT在耳鼻喉科知识评估中的表现。
Am J Otolaryngol. 2024 Jan-Feb;45(1):104082. doi: 10.1016/j.amjoto.2023.104082. Epub 2023 Oct 14.
3
Applying GPT-4 to the Plastic Surgery Inservice Training Examination.将 GPT-4 应用于整形外科住院医师培训考试。
J Plast Reconstr Aesthet Surg. 2023 Dec;87:78-82. doi: 10.1016/j.bjps.2023.09.027. Epub 2023 Sep 14.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验