Morishita Masaki, Fukuda Hikaru, Yamaguchi Shino, Muraoka Kosuke, Nakamura Taiji, Hayashi Masanari, Yoshioka Izumi, Ono Kentaro, Awano Shuji
Division of Clinical Education Development and Research, Department of Oral Function, Kyushu Dental University, Kitakyushu, Japan.
Health Information Management Office, Kyushu Dental University Hospital, Kitakyushu, Japan.
Saudi Dent J. 2024 Dec;36(12):1577-1581. doi: 10.1016/j.sdentj.2024.11.006. Epub 2024 Nov 26.
Multiple large language models (LLMs) have been released since 2022, including OpenAI's GPT-3.5 and GPT-4. The latest model, GPT-4o, introduced on May 13, 2024, significantly improves GPT-4. Previous studies have shown the potential of LLMs as educational tools in medical and dental exams. This study evaluates the accuracy of GPT-4 and GPT-4o responses for the Japanese National Dental Examination (JNDE) to assess their potential as educational tools for dental education.
We obtained the dataset of the 117th JNDE, administered in January 2024, consisting of 360 questions. After excluding questions with images and inappropriate ones, 202 questions were selected. GPT-4 and GPT-4o were used to generate responses. Standardized prompts ensured consistent input. Data analysis used Qlik Sense® and GraphPad Prism, employing Fisher's exact test.
GPT-4o showed a significantly higher correct response rate (73.8%) than GPT-4 (63.3%). In the compulsory section, GPT-4o achieved 88.6% accuracy, significantly higher than GPT-4's 74.3%. Though not statistically significant, the general section saw an improvement with GPT-4o (66.4%) over GPT-4 (58.0%).
GPT-4o significantly outperformed GPT-4 in accuracy for JNDE questions, suggesting its improved potential as an educational tool in dental education. Further studies are needed to evaluate GPT-4o's capabilities with visual materials and in diverse question sets to fully ascertain its utility in educational settings.
自2022年以来,多个大型语言模型(LLM)已发布,包括OpenAI的GPT-3.5和GPT-4。最新的模型GPT-4o于2024年5月13日推出,对GPT-4有显著改进。先前的研究表明LLM在医学和牙科考试中作为教育工具的潜力。本研究评估GPT-4和GPT-4o对日本国家牙科考试(JNDE)回答的准确性,以评估它们作为牙科教育工具的潜力。
我们获取了2024年1月进行的第117次JNDE的数据集,其中包含360道题。在排除带有图像和不适当的题目后,选择了202道题。使用GPT-4和GPT-4o生成回答。标准化提示确保输入一致。数据分析使用Qlik Sense®和GraphPad Prism,采用Fisher精确检验。
GPT-4o的正确回答率(73.8%)显著高于GPT-4(63.3%)。在必修部分,GPT-4o的准确率达到88.6%,显著高于GPT-4的74.3%。在一般部分,虽然无统计学意义,但GPT-4o(66.4%)比GPT-4(58.0%)有所提高。
在JNDE问题的准确性方面,GPT-4o显著优于GPT-4,表明其作为牙科教育工具的潜力有所提升。需要进一步研究以评估GPT-4o在视觉材料和不同问题集方面的能力,以充分确定其在教育环境中的效用。