ChatGPT在回答台湾地区国家牙科执照考试各类题型或主题的口腔病理学问题时的表现。

Performance of ChatGPT in answering the oral pathology questions of various types or subjects from Taiwan National Dental Licensing Examinations.

作者信息

Wu Yu-Hsueh, Tso Kai-Yun, Chiang Chun-Pin

机构信息

Department of Stomatology, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan, Taiwan.

Institute of Oral Medicine, School of Dentistry, National Cheng Kung University, Tainan, Taiwan.

出版信息

J Dent Sci. 2025 Jul;20(3):1709-1715. doi: 10.1016/j.jds.2025.03.030. Epub 2025 Apr 5.

DOI:10.1016/j.jds.2025.03.030

PMID:40654484

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12254725/

Abstract

BACKGROUND/PURPOSE: ChatGPT, a large language model, can provide an instant and personalized solution in a conversational format. Our study aimed to assess the potential application of ChatGPT-4, ChatGPT-4o without a prompt (ChatGPT-4o-P), and ChatGPT-4o with a prompt (ChatGPT-4o-P) in helping dental students to study oral pathology (OP) by evaluating their performance in answering the OP multiple choice questions (MCQs) of various types or subjects.

MATERIALS AND METHODS

A total of 280 OP MCQs were collected from Taiwan National Dental Licensing Examinations. The chatbots of ChatGPT-4, ChatGPT-4o-P, and ChatGPT-4o-P were instructed to answer the OP MCQs of various types and subjects.

RESULTS

ChatGPT-4o-P achieved the highest overall accuracy rate (AR) of 90.0 %, slightly outperforming ChatGPT-4o-P (88.6 % AR) and significantly exceeding ChatGPT-4 (79.6 % AR, < 0.001). There was a significant difference in the AR of odd-one-out questions between ChatGPT-4 (77.2 % AR) and ChatGPT-4o-P (91.3 % AR, = 0.015) or ChatGPT-4o-P (92.4 % AR, = 0.008). However, there was no significant difference in the AR among three different models when answering the image-based and case-based questions. Of the 11 different OP subjects of single-disease, all three different models achieved a 100 % AR in three subjects; ChatGPT-4o-P outperformed ChatGPT-4 and ChatGPT-4o-P in other 3 subjects; ChatGPT-4o-P was superior to ChatGPT-4 and ChatGPT-4o-P in another 3 subjects; and ChatGPT-4o-P and ChatGPT-4o-P had equal performance and both were better than ChatGPT-4 in the rest of two subjects.

CONCLUSION

In overall evaluation, ChatGPT-4o-P has better performance than ChatGPT-4o-P and ChatGPT-4 in answering the OP MCQs.

摘要

背景/目的：大型语言模型ChatGPT能够以对话形式提供即时且个性化的解决方案。我们的研究旨在通过评估ChatGPT-4、无提示的ChatGPT-4o（ChatGPT-4o-P）和有提示的ChatGPT-4o（ChatGPT-4o-P）在回答各类或各主题口腔病理学（OP）多项选择题（MCQ）时的表现，来评估它们在帮助牙科学生学习口腔病理学方面的潜在应用。

材料与方法

从台湾国家牙科执照考试中收集了总共280道OP MCQ。指示ChatGPT-4、ChatGPT-4o-P和ChatGPT-4o-P的聊天机器人回答各类和各主题的OP MCQ。

结果

ChatGPT-4o-P的总体准确率最高，为90.0%，略高于ChatGPT-4o-P（准确率88.6%），且显著高于ChatGPT-4（准确率79.6%，<0.001）。ChatGPT-4（准确率77.2%）与ChatGPT-4o-P（准确率91.3%，=0.015）或ChatGPT-4o-P（准确率92.4%，=0.008）在排除异类问题的准确率上存在显著差异。然而，在回答基于图像和基于病例的问题时，三种不同模型的准确率没有显著差异。在11个单病种的不同OP主题中，所有三种不同模型在3个主题上的准确率均达到100%；ChatGPT-4o-P在另外3个主题上优于ChatGPT-4和ChatGPT-4o-P；ChatGPT-4o-P在另外3个主题上优于ChatGPT-4和ChatGPT-4o-P；在其余2个主题中，ChatGPT-4o-P和ChatGPT-4o-P表现相当，且均优于ChatGPT-4。