ChatGPT-3.5、ChatGPT-4和谷歌Gemini大型语言模型在回答牙种植学相关问题方面的表现。

Performance of the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models in responding to dental implantology inquiries.

作者信息

Taymour Noha, Fouda Shaimaa M, Abdelrahaman Hams H, Hassan Mohamed G

机构信息

Lecturer, Department of Substitutive Dental Sciences, College of Dentistry, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia.

出版信息

J Prosthet Dent. 2025 Jan 4. doi: 10.1016/j.prosdent.2024.12.016.

DOI:10.1016/j.prosdent.2024.12.016

PMID:39757053

Abstract

STATEMENT OF PROBLEM

Artificial intelligence (AI) chatbots have been proposed as promising resources for oral health information. However, the quality and readability of existing online health-related information is often inconsistent and challenging.

PURPOSE

This study aimed to compare the reliability and usefulness of dental implantology-related information provided by the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models (LLMs).

MATERIAL AND METHODS

A total of 75 questions were developed covering various dental implant domains. These questions were then presented to 3 different LLMs: ChatGPT-3.5, ChatGPT-4, and Google Gemini. The responses generated were recorded and independently assessed by 2 specialists who were blinded to the source of the responses. The evaluation focused on the accuracy of the generated answers using a modified 5-point Likert scale to measure the reliability and usefulness of the information provided. Additionally, the ability of the AI-chatbots to offer definitive responses to closed questions, provide reference citation, and advise scheduling consultations with a dental specialist was also analyzed. The Friedman, Mann Whitney U and Spearman Correlation tests were used for data analysis (α=.05).

RESULTS

Google Gemini exhibited higher reliability and usefulness scores compared with ChatGPT-3.5 and ChatGPT-4 (P<.001). Google Gemini also demonstrated superior proficiency in identifying closed questions (25 questions, 41%) and recommended specialist consultations for 74 questions (98.7%), significantly outperforming ChatGPT-4 (30 questions, 40.0%) and ChatGPT-3.5 (28 questions, 37.3%) (P<.001). A positive correlation was found between reliability and usefulness scores, with Google Gemini showing the strongest correlation (ρ=.702).

CONCLUSIONS

The 3 AI Chatbots showed acceptable levels of reliability and usefulness in addressing dental implant-related queries. Google Gemini distinguished itself by providing responses consistent with specialist consultations.

摘要

问题陈述

人工智能（AI）聊天机器人已被视为获取口腔健康信息的有前景的资源。然而，现有的在线健康相关信息的质量和可读性往往不一致且颇具挑战性。

目的

本研究旨在比较ChatGPT-3.5、ChatGPT-4和谷歌Gemini大语言模型（LLMs）提供的牙种植学相关信息的可靠性和实用性。

材料与方法

共提出75个涵盖牙种植各个领域的问题。然后将这些问题呈现给3个不同的大语言模型：ChatGPT-3.5、ChatGPT-4和谷歌Gemini。记录生成的回答，并由2名对回答来源不知情的专家进行独立评估。评估重点是使用改良的5点李克特量表来衡量所提供信息的可靠性和实用性，以评估生成答案的准确性。此外，还分析了人工智能聊天机器人对封闭式问题给出明确回答、提供参考文献引用以及建议安排与牙科专家会诊的能力。使用Friedman检验、Mann Whitney U检验和Spearman相关性检验进行数据分析（α = 0.05）。

结果

与ChatGPT-3.5和ChatGPT-4相比，谷歌Gemini表现出更高的可靠性和实用性得分（P < 0.001）。谷歌Gemini在识别封闭式问题（25个问题，41%）方面也表现出更高的熟练度，并针对74个问题（98.7%）推荐了专家会诊，显著优于ChatGPT-4（30个问题，40.0%）和ChatGPT-3.5（28个问题，37.3%）（P < 0.001）。可靠性和实用性得分之间存在正相关，谷歌Gemini的相关性最强（ρ = 0.702）。

结论

这3个人工智能聊天机器人在回答牙种植相关问题时显示出可接受的可靠性和实用性水平。谷歌Gemini通过提供与专家会诊一致的回答脱颖而出。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

ChatGPT-3.5、ChatGPT-4和谷歌Gemini大型语言模型在回答牙种植学相关问题方面的表现。

Performance of the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models in responding to dental implantology inquiries.

作者信息

机构信息

出版信息

STATEMENT OF PROBLEM

PURPOSE

MATERIAL AND METHODS

RESULTS

CONCLUSIONS

问题陈述

目的

材料与方法

结果

结论

相似文献

引用本文的文献

ChatGPT-3.5、ChatGPT-4和谷歌Gemini大型语言模型在回答牙种植学相关问题方面的表现。

Performance of the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models in responding to dental implantology inquiries.

作者信息

机构信息

出版信息

STATEMENT OF PROBLEM

PURPOSE

MATERIAL AND METHODS

RESULTS

CONCLUSIONS

问题陈述

目的

材料与方法

结果

结论

相似文献

引用本文的文献