Felemban Doaa, Jazzar Ahoud, Mair Yasmin, Alsharif Maha, Alsharif Alla, Kassim Saba
Department of Oral and Maxillofacial Diagnostic sciences, College of Dentistry, Taibah University, Al-Madinah Al-Munawwarah, Saudi Arabia.
Department of Oral Diagnostic sciences, King Abdulaziz University, Faculty of Dentistry, Jeddah, Saudi Arabia.
Digit Health. 2025 Jul 8;11:20552076251355847. doi: 10.1177/20552076251355847. eCollection 2025 Jan-Dec.
This study is designed to evaluate the accuracy of ChatGPT models (3.5, 4.0 and 4 Turbo) in answering multiple-choice questions (MCQs) related to oral and maxillofacial pathology and oral radiology, thus, providing reliable information in the field of dentistry.
A set of 136 validated MCQs varies between knowledge and cognitive were used in the study. The questions covered different topics related to odontogenic cysts, tumours and bone lesions. Difficulty of the questions was evaluated by two MCQ-item writing, board-certified reviewers in the fields. The questions were entered into Chat GPT-3.5, ChatGPT-4 and ChatGPT-4 Turbo independently. .
Fifty-six percent of the total questions were related to oral radiology, and 66% were categorised as easy. The dataset consisted primarily of questions testing knowledge (87%), with only 13% of questions assessing cognitive skills. ChatGPT-4 Turbo exhibited the highest accuracy, answering 90% of questions correctly, followed by ChatGPT-4.0 with 85% accuracy and ChatGPT-3.5 with 78% accuracy. Only 98 questions (72%) were correctly answered by the three models. Ten months later, the unpaid ChatGPT version showed a significant improvement in accuracy, while the paid versions maintained consistent performance over time with no significant differences.
The findings suggest that, while AI can be a helpful tool in dental education, limitations persist that must be addressed, particularly in terms of complex cognitive skills and image-based questions. This study provides valuable insights into the capabilities and potential improvements of AI applications in dental education.
本研究旨在评估ChatGPT模型(3.5、4.0和4 Turbo)在回答与口腔颌面病理学和口腔放射学相关的多项选择题(MCQ)时的准确性,从而在牙科领域提供可靠信息。
本研究使用了一组136道经验证的MCQ,涵盖知识和认知方面。这些问题涉及与牙源性囊肿、肿瘤和骨病变相关的不同主题。问题的难度由该领域两名经过委员会认证的MCQ项目编写评审员进行评估。这些问题被分别输入Chat GPT-3.5、ChatGPT-4和ChatGPT-4 Turbo。
总问题的56%与口腔放射学相关,66%被归类为简单。数据集主要由测试知识的问题组成(87%),只有13%的问题评估认知技能。ChatGPT-4 Turbo表现出最高的准确率,正确回答了90%的问题,其次是ChatGPT-4.0,准确率为85%,ChatGPT-3.5的准确率为78%。三个模型仅正确回答了98个问题(72%)。十个月后,免费的ChatGPT版本在准确率上有了显著提高,而付费版本随着时间的推移保持了一致的表现,没有显著差异。
研究结果表明,虽然人工智能在牙科教育中可以是一个有用的工具,但仍然存在必须解决的局限性,特别是在复杂认知技能和基于图像的问题方面。本研究为人工智能在牙科教育中的能力和潜在改进提供了有价值的见解。