Tassoker Melek
Department of Dentomaxillofacial Radiology, Faculty of Dentistry, Necmettin Erbakan University, Baglarbasi sk, Meram, Konya, 42050, Türkiye.
BMC Oral Health. 2025 Feb 1;25(1):173. doi: 10.1186/s12903-025-05554-w.
This study evaluates and compares the performance of ChatGPT-3.5, ChatGPT-4 Omni (4o), Google Bard, and Microsoft Copilot in responding to text-based multiple-choice questions related to oral radiology, as featured in the Dental Specialty Admission Exam conducted in Türkiye.
A collection of text-based multiple-choice questions was sourced from the open-access question bank of the Turkish Dental Specialty Admission Exam, covering the years 2012 to 2021. The study included 123 questions, each with five options and one correct answer. The accuracy levels of ChatGPT-3.5, ChatGPT-4o, Google Bard, and Microsoft Copilot were compared using descriptive statistics, the Kruskal-Wallis test, Dunn's post hoc test, and Cochran's Q test.
The accuracy of the responses generated by the four chatbots exhibited statistically significant differences (p = 0.000). ChatGPT-4o achieved the highest accuracy at 86.1%, followed by Google Bard at 61.8%. ChatGPT-3.5 demonstrated an accuracy rate of 43.9%, while Microsoft Copilot recorded a rate of 41.5%.
ChatGPT-4o showcases superior accuracy and advanced reasoning capabilities, positioning it as a promising educational tool. With regular updates, it has the potential to serve as a reliable source of information for both healthcare professionals and the general public.
Not applicable.
本研究评估并比较ChatGPT-3.5、ChatGPT-4 Omni(4o)、谷歌巴德和微软副驾驶在回答与口腔放射学相关的基于文本的多项选择题时的表现,这些题目来自土耳其牙科专业入学考试。
从土耳其牙科专业入学考试的开放获取题库中收集了一系列基于文本的多项选择题,涵盖2012年至2021年。该研究包括123道题目,每题有五个选项和一个正确答案。使用描述性统计、克鲁斯卡尔-沃利斯检验、邓恩事后检验和 Cochr an Q检验比较了ChatGPT-3.5、ChatGPT-4o、谷歌巴德和微软副驾驶的准确率。
四个聊天机器人生成的回答准确率在统计学上有显著差异(p = 0.000)。ChatGPT-4o的准确率最高,为86.1%,其次是谷歌巴德,为61.8%。ChatGPT-3.5的准确率为43.9%,而微软副驾驶的准确率为41.5%。
ChatGPT-4o展示了卓越的准确率和先进的推理能力,使其成为一个有前途的教育工具。随着定期更新,它有可能成为医疗专业人员和公众可靠的信息来源。
不适用。