评估ChatGPT模型在回答口腔颌面病理学和口腔放射学多项选择题方面的准确性。

Evaluating the accuracy of CHATGPT models in answering multiple-choice questions on oral and maxillofacial pathologies and oral radiology.

作者信息

Felemban Doaa, Jazzar Ahoud, Mair Yasmin, Alsharif Maha, Alsharif Alla, Kassim Saba

机构信息

Department of Oral and Maxillofacial Diagnostic sciences, College of Dentistry, Taibah University, Al-Madinah Al-Munawwarah, Saudi Arabia.

Department of Oral Diagnostic sciences, King Abdulaziz University, Faculty of Dentistry, Jeddah, Saudi Arabia.

出版信息

Digit Health. 2025 Jul 8;11:20552076251355847. doi: 10.1177/20552076251355847. eCollection 2025 Jan-Dec.

DOI:10.1177/20552076251355847

PMID:40656850

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12246668/

Abstract

OBJECTIVE

This study is designed to evaluate the accuracy of ChatGPT models (3.5, 4.0 and 4 Turbo) in answering multiple-choice questions (MCQs) related to oral and maxillofacial pathology and oral radiology, thus, providing reliable information in the field of dentistry.

METHODS

A set of 136 validated MCQs varies between knowledge and cognitive were used in the study. The questions covered different topics related to odontogenic cysts, tumours and bone lesions. Difficulty of the questions was evaluated by two MCQ-item writing, board-certified reviewers in the fields. The questions were entered into Chat GPT-3.5, ChatGPT-4 and ChatGPT-4 Turbo independently. .

RESULTS

Fifty-six percent of the total questions were related to oral radiology, and 66% were categorised as easy. The dataset consisted primarily of questions testing knowledge (87%), with only 13% of questions assessing cognitive skills. ChatGPT-4 Turbo exhibited the highest accuracy, answering 90% of questions correctly, followed by ChatGPT-4.0 with 85% accuracy and ChatGPT-3.5 with 78% accuracy. Only 98 questions (72%) were correctly answered by the three models. Ten months later, the unpaid ChatGPT version showed a significant improvement in accuracy, while the paid versions maintained consistent performance over time with no significant differences.

CONCLUSION

The findings suggest that, while AI can be a helpful tool in dental education, limitations persist that must be addressed, particularly in terms of complex cognitive skills and image-based questions. This study provides valuable insights into the capabilities and potential improvements of AI applications in dental education.

摘要

目的

本研究旨在评估ChatGPT模型（3.5、4.0和4 Turbo）在回答与口腔颌面病理学和口腔放射学相关的多项选择题（MCQ）时的准确性，从而在牙科领域提供可靠信息。

方法

本研究使用了一组136道经验证的MCQ，涵盖知识和认知方面。这些问题涉及与牙源性囊肿、肿瘤和骨病变相关的不同主题。问题的难度由该领域两名经过委员会认证的MCQ项目编写评审员进行评估。这些问题被分别输入Chat GPT-3.5、ChatGPT-4和ChatGPT-4 Turbo。

结果

总问题的56%与口腔放射学相关，66%被归类为简单。数据集主要由测试知识的问题组成（87%），只有13%的问题评估认知技能。ChatGPT-4 Turbo表现出最高的准确率，正确回答了90%的问题，其次是ChatGPT-4.0，准确率为85%，ChatGPT-3.5的准确率为78%。三个模型仅正确回答了98个问题（72%）。十个月后，免费的ChatGPT版本在准确率上有了显著提高，而付费版本随着时间的推移保持了一致的表现，没有显著差异。

结论

研究结果表明，虽然人工智能在牙科教育中可以是一个有用的工具，但仍然存在必须解决的局限性，特别是在复杂认知技能和基于图像的问题方面。本研究为人工智能在牙科教育中的能力和潜在改进提供了有价值的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cae0/12246668/4677a3261b0a/10.1177_20552076251355847-fig1.jpg

相似文献

Evaluating the accuracy of CHATGPT models in answering multiple-choice questions on oral and maxillofacial pathologies and oral radiology.

Digit Health. 2025 Jul 8;11:20552076251355847. doi: 10.1177/20552076251355847. eCollection 2025 Jan-Dec.

Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.

J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.

JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.

Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).

J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.

Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.

Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.

Accuracy and Reliability of Artificial Intelligence Chatbots as Public Information Sources in Implant Dentistry.

Int J Oral Maxillofac Implants. 2025 Jun 25;0(0):1-23. doi: 10.11607/jomi.11280.

Large Language Models and Empathy: Systematic Review.

J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.

Comparative Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4.0 in Answering Questions From a Brazilian National Medical Exam: Cross-Sectional Questionnaire Study.

JMIR AI. 2025 May 8;4:e66552. doi: 10.2196/66552.

Evaluation of ChatGPT Performance on Emergency Medicine Board Examination Questions: Observational Study.

JMIR AI. 2025 Mar 12;4:e67696. doi: 10.2196/67696.

Performance of ChatGPT in answering the oral pathology questions of various types or subjects from Taiwan National Dental Licensing Examinations.

J Dent Sci. 2025 Jul;20(3):1709-1715. doi: 10.1016/j.jds.2025.03.030. Epub 2025 Apr 5.

本文引用的文献

Evaluating the efficacy of leading large language models in the Japanese national dental hygienist examination: A comparative analysis of ChatGPT, Bard, and Bing Chat.

J Dent Sci. 2024 Oct;19(4):2262-2267. doi: 10.1016/j.jds.2024.02.019. Epub 2024 Feb 29.

ChatGPT's risk of misinformation in dentistry: A comparative follow-up evaluation.

J Am Dent Assoc. 2025 Jan;156(1):3-5. doi: 10.1016/j.adaj.2024.05.003. Epub 2024 Jun 15.

How well do large language model-based chatbots perform in oral and maxillofacial radiology?

Dentomaxillofac Radiol. 2024 Sep 1;53(6):390-395. doi: 10.1093/dmfr/twae021.

Performance of Generative Artificial Intelligence in Dental Licensing Examinations.

Int Dent J. 2024 Jun;74(3):616-621. doi: 10.1016/j.identj.2023.12.007. Epub 2024 Jan 19.

The performance of ChatGPT on orthopaedic in-service training exams: A comparative study of the GPT-3.5 turbo and GPT-4 models in orthopaedic education.

J Orthop. 2023 Nov 23;50:70-75. doi: 10.1016/j.jor.2023.11.056. eCollection 2024 Apr.

Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.

J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.

Contemporary Role and Applications of Artificial Intelligence in Dentistry.

F1000Res. 2023 Sep 20;12:1179. doi: 10.12688/f1000research.140204.1. eCollection 2023.

The Potential Usefulness of ChatGPT in Oral and Maxillofacial Radiology.

Cureus. 2023 Jul 19;15(7):e42133. doi: 10.7759/cureus.42133. eCollection 2023 Jul.

Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers.

NPJ Digit Med. 2023 Apr 26;6(1):75. doi: 10.1038/s41746-023-00819-6.

ChatGPT: Chances and Challenges for Dentistry.

Compend Contin Educ Dent. 2023 Apr;44(4):220-224.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估ChatGPT模型在回答口腔颌面病理学和口腔放射学多项选择题方面的准确性。

Evaluating the accuracy of CHATGPT models in answering multiple-choice questions on oral and maxillofacial pathologies and oral radiology.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献