Suppr超能文献

对GPT-4o和GPT-4在日本国家牙科考试中的表现进行的探索性评估。

An exploratory assessment of GPT-4o and GPT-4 performance on the Japanese National Dental Examination.

作者信息

Morishita Masaki, Fukuda Hikaru, Yamaguchi Shino, Muraoka Kosuke, Nakamura Taiji, Hayashi Masanari, Yoshioka Izumi, Ono Kentaro, Awano Shuji

机构信息

Division of Clinical Education Development and Research, Department of Oral Function, Kyushu Dental University, Kitakyushu, Japan.

Health Information Management Office, Kyushu Dental University Hospital, Kitakyushu, Japan.

出版信息

Saudi Dent J. 2024 Dec;36(12):1577-1581. doi: 10.1016/j.sdentj.2024.11.006. Epub 2024 Nov 26.

Abstract

BACKGROUND AND OBJECTIVES

Multiple large language models (LLMs) have been released since 2022, including OpenAI's GPT-3.5 and GPT-4. The latest model, GPT-4o, introduced on May 13, 2024, significantly improves GPT-4. Previous studies have shown the potential of LLMs as educational tools in medical and dental exams. This study evaluates the accuracy of GPT-4 and GPT-4o responses for the Japanese National Dental Examination (JNDE) to assess their potential as educational tools for dental education.

MATERIALS AND METHODS

We obtained the dataset of the 117th JNDE, administered in January 2024, consisting of 360 questions. After excluding questions with images and inappropriate ones, 202 questions were selected. GPT-4 and GPT-4o were used to generate responses. Standardized prompts ensured consistent input. Data analysis used Qlik Sense® and GraphPad Prism, employing Fisher's exact test.

RESULTS

GPT-4o showed a significantly higher correct response rate (73.8%) than GPT-4 (63.3%). In the compulsory section, GPT-4o achieved 88.6% accuracy, significantly higher than GPT-4's 74.3%. Though not statistically significant, the general section saw an improvement with GPT-4o (66.4%) over GPT-4 (58.0%).

CONCLUSION

GPT-4o significantly outperformed GPT-4 in accuracy for JNDE questions, suggesting its improved potential as an educational tool in dental education. Further studies are needed to evaluate GPT-4o's capabilities with visual materials and in diverse question sets to fully ascertain its utility in educational settings.

摘要

背景与目的

自2022年以来,多个大型语言模型(LLM)已发布,包括OpenAI的GPT-3.5和GPT-4。最新的模型GPT-4o于2024年5月13日推出,对GPT-4有显著改进。先前的研究表明LLM在医学和牙科考试中作为教育工具的潜力。本研究评估GPT-4和GPT-4o对日本国家牙科考试(JNDE)回答的准确性,以评估它们作为牙科教育工具的潜力。

材料与方法

我们获取了2024年1月进行的第117次JNDE的数据集,其中包含360道题。在排除带有图像和不适当的题目后,选择了202道题。使用GPT-4和GPT-4o生成回答。标准化提示确保输入一致。数据分析使用Qlik Sense®和GraphPad Prism,采用Fisher精确检验。

结果

GPT-4o的正确回答率(73.8%)显著高于GPT-4(63.3%)。在必修部分,GPT-4o的准确率达到88.6%,显著高于GPT-4的74.3%。在一般部分,虽然无统计学意义,但GPT-4o(66.4%)比GPT-4(58.0%)有所提高。

结论

在JNDE问题的准确性方面,GPT-4o显著优于GPT-4,表明其作为牙科教育工具的潜力有所提升。需要进一步研究以评估GPT-4o在视觉材料和不同问题集方面的能力,以充分确定其在教育环境中的效用。

相似文献

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验