ChatGPT 3.5和4在美国牙科考试中的表现：国际牙科执照考试（INBDE）、高级牙科能力倾向测试（ADAT）和牙科入学考试（DAT）

Performance of ChatGPT 3.5 and 4 on U.S. dental examinations: the INBDE, ADAT, and DAT.

作者信息

Dashti Mahmood, Ghasemi Shohreh, Ghadimi Niloofar, Hefzi Delband, Karimian Azizeh, Zare Niusha, Fahimipour Amir, Khurshid Zohaib, Chafjiri Maryam Mohammadalizadeh, Ghaedsharaf Sahar

机构信息

Dentofacial Deformities Research Center, Research Institute of Dental Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

Department of Trauma and Craniofacial Reconstruction, Queen Mary College, London, England.

出版信息

Imaging Sci Dent. 2024 Sep;54(3):271-275. doi: 10.5624/isd.20240037. Epub 2024 Jul 2.

DOI:10.5624/isd.20240037

PMID:39371301

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11450412/

Abstract

PURPOSE

Recent advancements in artificial intelligence (AI), particularly tools such as ChatGPT developed by OpenAI, a U.S.-based AI research organization, have transformed the healthcare and education sectors. This study investigated the effectiveness of ChatGPT in answering dentistry exam questions, demonstrating its potential to enhance professional practice and patient care.

MATERIALS AND METHODS

This study assessed the performance of ChatGPT 3.5 and 4 on U.S. dental exams - specifically, the Integrated National Board Dental Examination (INBDE), Dental Admission Test (DAT), and Advanced Dental Admission Test (ADAT) - excluding image-based questions. Using customized prompts, ChatGPT's answers were evaluated against official answer sheets.

RESULTS

ChatGPT 3.5 and 4 were tested with 253 questions from the INBDE, ADAT, and DAT exams. For the INBDE, both versions achieved 80% accuracy in knowledge-based questions and 66-69% in case history questions. In ADAT, they scored 66-83% in knowledge-based and 76% in case history questions. ChatGPT 4 excelled on the DAT, with 94% accuracy in knowledge-based questions, 57% in mathematical analysis items, and 100% in comprehension questions, surpassing ChatGPT 3.5's rates of 83%, 31%, and 82%, respectively. The difference was significant for knowledge-based questions (=0.009). Both versions showed similar patterns in incorrect responses.

CONCLUSION

Both ChatGPT 3.5 and 4 effectively handled knowledge-based, case history, and comprehension questions, with ChatGPT 4 being more reliable and surpassing the performance of 3.5. ChatGPT 4's perfect score in comprehension questions underscores its trainability in specific subjects. However, both versions exhibited weaker performance in mathematical analysis, suggesting this as an area for improvement.

摘要

目的

人工智能（AI）的最新进展，特别是由美国人工智能研究组织OpenAI开发的ChatGPT等工具，已经改变了医疗保健和教育领域。本研究调查了ChatGPT在回答牙科考试问题方面的有效性，证明了其在提升专业实践和患者护理方面的潜力。

材料与方法

本研究评估了ChatGPT 3.5和4在美国牙科考试中的表现——具体来说，是综合国家委员会牙科考试（INBDE）、牙科入学考试（DAT）和高级牙科入学考试（ADAT）——不包括基于图像的问题。使用定制的提示，根据官方答案对ChatGPT的答案进行评估。

结果

ChatGPT 3.5和4用来自INBDE、ADAT和DAT考试的253个问题进行了测试。对于INBDE，两个版本在基于知识的问题上准确率达到80%，在病史问题上准确率为66%-69%。在ADAT中，它们在基于知识的问题上得分66%-83%，在病史问题上得分76%。ChatGPT 4在DAT上表现出色，在基于知识的问题上准确率为94%，在数学分析项目上为57%，在理解问题上为100%，分别超过ChatGPT 3.5的83%、31%和82%。基于知识的问题差异显著（=0.009）。两个版本在错误回答上表现出相似模式。