Baris Sevda Durust, Baris Kubilay
Kırıkkale University, Kırıkkale, Turkey.
BMC Oral Health. 2025 May 22;25(1):763. doi: 10.1186/s12903-025-06149-1.
The objective of this study was to evaluate the performance of ScholarGPT, ChatGPT-4o and Google Gemini in responding to queries pertaining to endodontic apical surgery, a subject that demands advanced specialist knowledge in endodontics.
A total of 30 questions, including 12 binary and 18 open-ended queries, were formulated based on information on endodontic apical surgery taken from a well-known endodontic book called Cohen's pathways of the pulp (12th edition). The questions were posed by two different researchers using different accounts on the ScholarGPT, ChatGPT-4o and Gemini platforms. The responses were then coded by the researchers and categorised as 'correct', 'incorrect', or 'insufficient'. The Pearson chi-square test was used to assess the relationships between the platforms.
A total of 5,400 responses were evaluated. Chi-square analysis revealed statistically significant differences between the accuracy of the responses provided applications (χ² = 22.61; p < 0.05). ScholarGPT demonstrated the highest rate of correct responses (97.7%), followed by ChatGPT-4o with 90.1%. Conversely, Gemini exhibited the lowest correct response rate (59.5%) among the applications examined.
ScholarGPT performed better overall on questions about endodontic apical surgery than ChatGPT-4o and Gemini. GPT models based on academic databases, such as ScholarGPT, may provide more accurate information about dentistry. However, additional research should be conducted to develop a GPT model that is specifically tailored to the field of endodontics.
本研究的目的是评估ScholarGPT、ChatGPT - 4o和谷歌Gemini在回答与牙髓病根尖手术相关问题方面的表现,牙髓病根尖手术这一主题需要牙髓病学方面的高级专业知识。
基于从一本著名的牙髓病学书籍《科恩牙髓通路》(第12版)中获取的牙髓病根尖手术信息,总共制定了30个问题,其中包括12个二元问题和18个开放式问题。这些问题由两名不同的研究人员使用ScholarGPT、ChatGPT - 4o和Gemini平台上的不同账号提出。然后,研究人员对回答进行编码,并分类为“正确”、“错误”或“不充分”。使用Pearson卡方检验来评估各平台之间的关系。
总共评估了5400个回答。卡方分析显示,各应用程序提供的回答准确性之间存在统计学上的显著差异(χ² = 22.61;p < 0.05)。ScholarGPT的正确回答率最高(97.7%),其次是ChatGPT - 4o,为90.1%。相反,在所研究的应用程序中,Gemini的正确回答率最低(59.5%)。
在关于牙髓病根尖手术的问题上,ScholarGPT总体表现优于ChatGPT - 4o和Gemini。基于学术数据库的GPT模型,如ScholarGPT,可能会提供有关牙科更准确的信息。然而,应该进行更多的研究来开发专门针对牙髓病学领域的GPT模型。