ChatGPT 3.5和4在美国牙科考试中的表现:国际牙科执照考试(INBDE)、高级牙科能力倾向测试(ADAT)和牙科入学考试(DAT)

Performance of ChatGPT 3.5 and 4 on U.S. dental examinations: the INBDE, ADAT, and DAT.

作者信息

Dashti Mahmood, Ghasemi Shohreh, Ghadimi Niloofar, Hefzi Delband, Karimian Azizeh, Zare Niusha, Fahimipour Amir, Khurshid Zohaib, Chafjiri Maryam Mohammadalizadeh, Ghaedsharaf Sahar

机构信息

Dentofacial Deformities Research Center, Research Institute of Dental Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

Department of Trauma and Craniofacial Reconstruction, Queen Mary College, London, England.

出版信息

Imaging Sci Dent. 2024 Sep;54(3):271-275. doi: 10.5624/isd.20240037. Epub 2024 Jul 2.

Abstract

PURPOSE

Recent advancements in artificial intelligence (AI), particularly tools such as ChatGPT developed by OpenAI, a U.S.-based AI research organization, have transformed the healthcare and education sectors. This study investigated the effectiveness of ChatGPT in answering dentistry exam questions, demonstrating its potential to enhance professional practice and patient care.

MATERIALS AND METHODS

This study assessed the performance of ChatGPT 3.5 and 4 on U.S. dental exams - specifically, the Integrated National Board Dental Examination (INBDE), Dental Admission Test (DAT), and Advanced Dental Admission Test (ADAT) - excluding image-based questions. Using customized prompts, ChatGPT's answers were evaluated against official answer sheets.

RESULTS

ChatGPT 3.5 and 4 were tested with 253 questions from the INBDE, ADAT, and DAT exams. For the INBDE, both versions achieved 80% accuracy in knowledge-based questions and 66-69% in case history questions. In ADAT, they scored 66-83% in knowledge-based and 76% in case history questions. ChatGPT 4 excelled on the DAT, with 94% accuracy in knowledge-based questions, 57% in mathematical analysis items, and 100% in comprehension questions, surpassing ChatGPT 3.5's rates of 83%, 31%, and 82%, respectively. The difference was significant for knowledge-based questions (=0.009). Both versions showed similar patterns in incorrect responses.

CONCLUSION

Both ChatGPT 3.5 and 4 effectively handled knowledge-based, case history, and comprehension questions, with ChatGPT 4 being more reliable and surpassing the performance of 3.5. ChatGPT 4's perfect score in comprehension questions underscores its trainability in specific subjects. However, both versions exhibited weaker performance in mathematical analysis, suggesting this as an area for improvement.

摘要

目的

人工智能(AI)的最新进展,特别是由美国人工智能研究组织OpenAI开发的ChatGPT等工具,已经改变了医疗保健和教育领域。本研究调查了ChatGPT在回答牙科考试问题方面的有效性,证明了其在提升专业实践和患者护理方面的潜力。

材料与方法

本研究评估了ChatGPT 3.5和4在美国牙科考试中的表现——具体来说,是综合国家委员会牙科考试(INBDE)、牙科入学考试(DAT)和高级牙科入学考试(ADAT)——不包括基于图像的问题。使用定制的提示,根据官方答案对ChatGPT的答案进行评估。

结果

ChatGPT 3.5和4用来自INBDE、ADAT和DAT考试的253个问题进行了测试。对于INBDE,两个版本在基于知识的问题上准确率达到80%,在病史问题上准确率为66%-69%。在ADAT中,它们在基于知识的问题上得分66%-83%,在病史问题上得分76%。ChatGPT 4在DAT上表现出色,在基于知识的问题上准确率为94%,在数学分析项目上为57%,在理解问题上为100%,分别超过ChatGPT 3.5的83%、31%和82%。基于知识的问题差异显著(=0.009)。两个版本在错误回答上表现出相似模式。

结论

ChatGPT 3.5和4都能有效处理基于知识、病史和理解的问题,ChatGPT 4更可靠,表现超过3.5。ChatGPT 4在理解问题上的满分突出了其在特定学科的可训练性。然而,两个版本在数学分析方面表现较弱,表明这是一个需要改进的领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9beb/11450412/f1a75a18c2ca/isd-54-271-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索