Suppr超能文献

ChatGPT-4 Omni在回答口腔放射学选择题方面的优势。

ChatGPT-4 Omni's superiority in answering multiple-choice oral radiology questions.

作者信息

Tassoker Melek

机构信息

Department of Dentomaxillofacial Radiology, Faculty of Dentistry, Necmettin Erbakan University, Baglarbasi sk, Meram, Konya, 42050, Türkiye.

出版信息

BMC Oral Health. 2025 Feb 1;25(1):173. doi: 10.1186/s12903-025-05554-w.

Abstract

OBJECTIVES

This study evaluates and compares the performance of ChatGPT-3.5, ChatGPT-4 Omni (4o), Google Bard, and Microsoft Copilot in responding to text-based multiple-choice questions related to oral radiology, as featured in the Dental Specialty Admission Exam conducted in Türkiye.

MATERIALS AND METHODS

A collection of text-based multiple-choice questions was sourced from the open-access question bank of the Turkish Dental Specialty Admission Exam, covering the years 2012 to 2021. The study included 123 questions, each with five options and one correct answer. The accuracy levels of ChatGPT-3.5, ChatGPT-4o, Google Bard, and Microsoft Copilot were compared using descriptive statistics, the Kruskal-Wallis test, Dunn's post hoc test, and Cochran's Q test.

RESULTS AND DISCUSSION

The accuracy of the responses generated by the four chatbots exhibited statistically significant differences (p = 0.000). ChatGPT-4o achieved the highest accuracy at 86.1%, followed by Google Bard at 61.8%. ChatGPT-3.5 demonstrated an accuracy rate of 43.9%, while Microsoft Copilot recorded a rate of 41.5%.

CONCLUSION

ChatGPT-4o showcases superior accuracy and advanced reasoning capabilities, positioning it as a promising educational tool. With regular updates, it has the potential to serve as a reliable source of information for both healthcare professionals and the general public.

CLINICAL TRIAL NUMBER

Not applicable.

摘要

目的

本研究评估并比较ChatGPT-3.5、ChatGPT-4 Omni(4o)、谷歌巴德和微软副驾驶在回答与口腔放射学相关的基于文本的多项选择题时的表现,这些题目来自土耳其牙科专业入学考试。

材料与方法

从土耳其牙科专业入学考试的开放获取题库中收集了一系列基于文本的多项选择题,涵盖2012年至2021年。该研究包括123道题目,每题有五个选项和一个正确答案。使用描述性统计、克鲁斯卡尔-沃利斯检验、邓恩事后检验和 Cochr an Q检验比较了ChatGPT-3.5、ChatGPT-4o、谷歌巴德和微软副驾驶的准确率。

结果与讨论

四个聊天机器人生成的回答准确率在统计学上有显著差异(p = 0.000)。ChatGPT-4o的准确率最高,为86.1%,其次是谷歌巴德,为61.8%。ChatGPT-3.5的准确率为43.9%,而微软副驾驶的准确率为41.5%。

结论

ChatGPT-4o展示了卓越的准确率和先进的推理能力,使其成为一个有前途的教育工具。随着定期更新,它有可能成为医疗专业人员和公众可靠的信息来源。

临床试验编号

不适用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db98/11786404/d1781eb97706/12903_2025_5554_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验