评估各种人工智能应用在应对牙髓外科技术问题方面的表现。

Assessment of various artificial intelligence applications in responding to technical questions in endodontic surgery.

作者信息

Baris Sevda Durust, Baris Kubilay

机构信息

Kırıkkale University, Kırıkkale, Turkey.

出版信息

BMC Oral Health. 2025 May 22;25(1):763. doi: 10.1186/s12903-025-06149-1.

DOI:10.1186/s12903-025-06149-1

PMID:40405212

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12096613/

Abstract

BACKGROUND

The objective of this study was to evaluate the performance of ScholarGPT, ChatGPT-4o and Google Gemini in responding to queries pertaining to endodontic apical surgery, a subject that demands advanced specialist knowledge in endodontics.

METHODS

A total of 30 questions, including 12 binary and 18 open-ended queries, were formulated based on information on endodontic apical surgery taken from a well-known endodontic book called Cohen's pathways of the pulp (12th edition). The questions were posed by two different researchers using different accounts on the ScholarGPT, ChatGPT-4o and Gemini platforms. The responses were then coded by the researchers and categorised as 'correct', 'incorrect', or 'insufficient'. The Pearson chi-square test was used to assess the relationships between the platforms.

RESULTS

A total of 5,400 responses were evaluated. Chi-square analysis revealed statistically significant differences between the accuracy of the responses provided applications (χ² = 22.61; p < 0.05). ScholarGPT demonstrated the highest rate of correct responses (97.7%), followed by ChatGPT-4o with 90.1%. Conversely, Gemini exhibited the lowest correct response rate (59.5%) among the applications examined.

CONCLUSIONS

ScholarGPT performed better overall on questions about endodontic apical surgery than ChatGPT-4o and Gemini. GPT models based on academic databases, such as ScholarGPT, may provide more accurate information about dentistry. However, additional research should be conducted to develop a GPT model that is specifically tailored to the field of endodontics.

摘要

背景

本研究的目的是评估ScholarGPT、ChatGPT - 4o和谷歌Gemini在回答与牙髓病根尖手术相关问题方面的表现，牙髓病根尖手术这一主题需要牙髓病学方面的高级专业知识。

方法

基于从一本著名的牙髓病学书籍《科恩牙髓通路》（第12版）中获取的牙髓病根尖手术信息，总共制定了30个问题，其中包括12个二元问题和18个开放式问题。这些问题由两名不同的研究人员使用ScholarGPT、ChatGPT - 4o和Gemini平台上的不同账号提出。然后，研究人员对回答进行编码，并分类为“正确”、“错误”或“不充分”。使用Pearson卡方检验来评估各平台之间的关系。

结果

总共评估了5400个回答。卡方分析显示，各应用程序提供的回答准确性之间存在统计学上的显著差异（χ² = 22.61；p < 0.05）。ScholarGPT的正确回答率最高（97.7%），其次是ChatGPT - 4o，为90.1%。相反，在所研究的应用程序中，Gemini的正确回答率最低（59.5%）。