• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT-3.5、ChatGPT-4和谷歌Gemini大型语言模型在回答牙种植学相关问题方面的表现。

Performance of the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models in responding to dental implantology inquiries.

作者信息

Taymour Noha, Fouda Shaimaa M, Abdelrahaman Hams H, Hassan Mohamed G

机构信息

Lecturer, Department of Substitutive Dental Sciences, College of Dentistry, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia.

Lecturer, Department of Substitutive Dental Sciences, College of Dentistry, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia.

出版信息

J Prosthet Dent. 2025 Jan 4. doi: 10.1016/j.prosdent.2024.12.016.

DOI:10.1016/j.prosdent.2024.12.016
PMID:39757053
Abstract

STATEMENT OF PROBLEM

Artificial intelligence (AI) chatbots have been proposed as promising resources for oral health information. However, the quality and readability of existing online health-related information is often inconsistent and challenging.

PURPOSE

This study aimed to compare the reliability and usefulness of dental implantology-related information provided by the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models (LLMs).

MATERIAL AND METHODS

A total of 75 questions were developed covering various dental implant domains. These questions were then presented to 3 different LLMs: ChatGPT-3.5, ChatGPT-4, and Google Gemini. The responses generated were recorded and independently assessed by 2 specialists who were blinded to the source of the responses. The evaluation focused on the accuracy of the generated answers using a modified 5-point Likert scale to measure the reliability and usefulness of the information provided. Additionally, the ability of the AI-chatbots to offer definitive responses to closed questions, provide reference citation, and advise scheduling consultations with a dental specialist was also analyzed. The Friedman, Mann Whitney U and Spearman Correlation tests were used for data analysis (α=.05).

RESULTS

Google Gemini exhibited higher reliability and usefulness scores compared with ChatGPT-3.5 and ChatGPT-4 (P<.001). Google Gemini also demonstrated superior proficiency in identifying closed questions (25 questions, 41%) and recommended specialist consultations for 74 questions (98.7%), significantly outperforming ChatGPT-4 (30 questions, 40.0%) and ChatGPT-3.5 (28 questions, 37.3%) (P<.001). A positive correlation was found between reliability and usefulness scores, with Google Gemini showing the strongest correlation (ρ=.702).

CONCLUSIONS

The 3 AI Chatbots showed acceptable levels of reliability and usefulness in addressing dental implant-related queries. Google Gemini distinguished itself by providing responses consistent with specialist consultations.

摘要

问题陈述

人工智能(AI)聊天机器人已被视为获取口腔健康信息的有前景的资源。然而,现有的在线健康相关信息的质量和可读性往往不一致且颇具挑战性。

目的

本研究旨在比较ChatGPT-3.5、ChatGPT-4和谷歌Gemini大语言模型(LLMs)提供的牙种植学相关信息的可靠性和实用性。

材料与方法

共提出75个涵盖牙种植各个领域的问题。然后将这些问题呈现给3个不同的大语言模型:ChatGPT-3.5、ChatGPT-4和谷歌Gemini。记录生成的回答,并由2名对回答来源不知情的专家进行独立评估。评估重点是使用改良的5点李克特量表来衡量所提供信息的可靠性和实用性,以评估生成答案的准确性。此外,还分析了人工智能聊天机器人对封闭式问题给出明确回答、提供参考文献引用以及建议安排与牙科专家会诊的能力。使用Friedman检验、Mann Whitney U检验和Spearman相关性检验进行数据分析(α = 0.05)。

结果

与ChatGPT-3.5和ChatGPT-4相比,谷歌Gemini表现出更高的可靠性和实用性得分(P < 0.001)。谷歌Gemini在识别封闭式问题(25个问题,41%)方面也表现出更高的熟练度,并针对74个问题(98.7%)推荐了专家会诊,显著优于ChatGPT-4(30个问题,40.0%)和ChatGPT-3.5(28个问题,37.3%)(P < 0.001)。可靠性和实用性得分之间存在正相关,谷歌Gemini的相关性最强(ρ = 0.702)。

结论

这3个人工智能聊天机器人在回答牙种植相关问题时显示出可接受的可靠性和实用性水平。谷歌Gemini通过提供与专家会诊一致的回答脱颖而出。

相似文献

1
Performance of the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models in responding to dental implantology inquiries.ChatGPT-3.5、ChatGPT-4和谷歌Gemini大型语言模型在回答牙种植学相关问题方面的表现。
J Prosthet Dent. 2025 Jan 4. doi: 10.1016/j.prosdent.2024.12.016.
2
Performance of three artificial intelligence (AI)-based large language models in standardized testing; implications for AI-assisted dental education.三种基于人工智能(AI)的大语言模型在标准化测试中的表现;对人工智能辅助牙科教育的启示。
J Periodontal Res. 2025 Feb;60(2):121-133. doi: 10.1111/jre.13323. Epub 2024 Jul 18.
3
Performance of AI-Chatbots to Common Temporomandibular Joint Disorders (TMDs) Patient Queries: Accuracy, Completeness, Reliability and Readability.人工智能聊天机器人对常见颞下颌关节紊乱病(TMDs)患者问题的回答:准确性、完整性、可靠性和可读性。
Orthod Craniofac Res. 2025 May 7. doi: 10.1111/ocr.12939.
4
The use of ChatGPT and Google Gemini in responding to orthognathic surgery-related questions: A comparative study.ChatGPT与谷歌Gemini在回答正颌外科相关问题中的应用:一项比较研究。
J World Fed Orthod. 2025 Feb;14(1):20-26. doi: 10.1016/j.ejwf.2024.09.004. Epub 2024 Oct 28.
5
Evaluation of the accuracy and readability of ChatGPT-4 and Google Gemini in providing information on retinal detachment: a multicenter expert comparative study.ChatGPT-4和谷歌Gemini在提供视网膜脱离信息方面的准确性和可读性评估:一项多中心专家对比研究。
Int J Retina Vitreous. 2024 Sep 2;10(1):61. doi: 10.1186/s40942-024-00579-9.
6
Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study.人工智能聊天机器人在回应与创伤性牙损伤相关的患者咨询中的表现:一项比较研究。
Dent Traumatol. 2025 Jun;41(3):338-347. doi: 10.1111/edt.13020. Epub 2024 Nov 22.
7
Assessing the Quality of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的质量:一项观察性横断面研究。
Cureus. 2024 Sep 23;16(9):e69996. doi: 10.7759/cureus.69996. eCollection 2024 Sep.
8
Evaluación de la fiabilidad y legibilidad de las respuestas de los chatbots como recurso de información al paciente para las exploraciones PET-TC más communes.评估聊天机器人回复作为常见PET-CT检查患者信息资源的可靠性和可读性。
Rev Esp Med Nucl Imagen Mol (Engl Ed). 2025 Jan-Feb;44(1):500065. doi: 10.1016/j.remnie.2024.500065. Epub 2024 Sep 28.
9
Evaluation of the reliability and readability of answers given by chatbots to frequently asked questions about endophthalmitis: A cross-sectional study on chatbots.评估聊天机器人对眼内炎常见问题回答的可靠性和可读性:一项关于聊天机器人的横断面研究。
Health Informatics J. 2024 Oct-Dec;30(4):14604582241304679. doi: 10.1177/14604582241304679.
10
Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.评估药物流产信息的准确性:ChatGPT与谷歌巴德人工智能的比较分析
Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan.

引用本文的文献

1
A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios.牙种植学中大型语言模型的多维性能评估:ChatGPT、百川智能、Grok、Gemini和通义千问在不同临床场景下的比较
BMC Oral Health. 2025 Jul 28;25(1):1272. doi: 10.1186/s12903-025-06619-6.
2
Comparison of responses from different artificial intelligence-powered chatbots regarding the All-on-four dental implant concept.不同人工智能驱动的聊天机器人对全口四颗种植牙概念的回答比较。
BMC Oral Health. 2025 Jun 5;25(1):922. doi: 10.1186/s12903-025-06294-7.
3
Comparative evaluation of responses from DeepSeek-R1, ChatGPT-o1, ChatGPT-4, and dental GPT chatbots to patient inquiries about dental and maxillofacial prostheses.
对DeepSeek-R1、ChatGPT-o1、ChatGPT-4和牙科GPT聊天机器人针对患者有关口腔颌面修复体询问的回复进行比较评估。
BMC Oral Health. 2025 May 31;25(1):871. doi: 10.1186/s12903-025-06267-w.
4
Enhancing patient-centered information on implant dentistry through prompt engineering: a comparison of four large language models.通过提示工程增强种植牙科以患者为中心的信息:四种大语言模型的比较
Front Oral Health. 2025 Apr 7;6:1566221. doi: 10.3389/froh.2025.1566221. eCollection 2025.
5
Artificial intelligence performance in answering multiple-choice oral pathology questions: a comparative analysis.人工智能在回答口腔病理学选择题方面的表现:一项对比分析。
BMC Oral Health. 2025 Apr 15;25(1):573. doi: 10.1186/s12903-025-05926-2.