Suppr超能文献

生成式人工智能大语言模型在牙脱位方面基于证据的潜力:ChatGPT与Gemini对比

Evidence-Based Potential of Generative Artificial Intelligence Large Language Models on Dental Avulsion: ChatGPT Versus Gemini.

作者信息

Tokgöz Kaplan Taibe, Cankar Muhammet

机构信息

Department of Pedodontics, Faculty of Dentistry, Karabuk University, Karabük, Turkey.

Private Clinic, Çorum, Turkey.

出版信息

Dent Traumatol. 2025 Apr;41(2):178-186. doi: 10.1111/edt.12999. Epub 2024 Nov 2.

Abstract

BACKGROUND

In this study, the accuracy and comprehensiveness of the answers given to questions about dental avulsion by two artificial intelligence-based language models, ChatGPT and Gemini, were comparatively evaluated.

MATERIALS AND METHODS

Based on the guidelines of the International Society of Dental Traumatology, a total of 33 questions were prepared, including multiple-choice questions, binary questions, and open-ended questions as technical questions and patient questions about dental avulsion. They were directed to ChatGPT and Gemini. Responses were recorded and scored by four pediatric dentists. Statistical analyses, including ICC analysis, were performed to determine the agreement and accuracy of the responses. The significance level was set as p < 0.050.

RESULTS

The mean score of the Gemini model was statistically significantly higher than the ChatGPT (p = 0.001). ChatGPT gave more correct answers to open-ended questions and T/F questions on dental avulsion; it showed the lowest accuracy in the MCQ section. There was no significant difference between the responses of the Gemini model to different types of questions on dental avulsion and the median scores (p = 0.088). ChatGPT and Gemini were analyzed with the Mann-Whitney U test without making a distinction between question types, and Gemini answers were found to be statistically significantly more accurate (p = 0.004).

CONCLUSIONS

The Gemini and ChatGPT language models based on the IADT guideline for dental avulsion undoubtedly show promise. To guarantee the successful incorporation of LLMs into practice, it is imperative to conduct additional research, clinical validation, and improvements to the models.

摘要

背景

在本研究中,对基于人工智能的两种语言模型ChatGPT和Gemini针对牙脱位问题给出答案的准确性和全面性进行了比较评估。

材料与方法

根据国际牙外伤学会的指南,准备了总共33个问题,包括多项选择题、二元问题以及作为技术问题和关于牙脱位的患者问题的开放式问题。这些问题被发送给ChatGPT和Gemini。由四位儿科牙医记录回答并评分。进行了包括ICC分析在内的统计分析,以确定回答的一致性和准确性。显著性水平设定为p < 0.050。

结果

Gemini模型的平均得分在统计学上显著高于ChatGPT(p = 0.001)。ChatGPT在关于牙脱位的开放式问题和是非题上给出了更多正确答案;它在多项选择题部分的准确率最低。Gemini模型对牙脱位不同类型问题的回答与中位数得分之间没有显著差异(p = 0.088)。对ChatGPT和Gemini进行了不区分问题类型的曼-惠特尼U检验,发现Gemini的答案在统计学上显著更准确(p = 0.004)。

结论

基于国际牙外伤学会牙脱位指南的Gemini和ChatGPT语言模型无疑显示出了前景。为确保将大语言模型成功纳入实践,必须对模型进行更多研究、临床验证和改进。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验