Taymour Noha, Fouda Shaimaa M, Abdelrahaman Hams H, Hassan Mohamed G
Lecturer, Department of Substitutive Dental Sciences, College of Dentistry, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia.
Lecturer, Department of Substitutive Dental Sciences, College of Dentistry, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia.
J Prosthet Dent. 2025 Jan 4. doi: 10.1016/j.prosdent.2024.12.016.
Artificial intelligence (AI) chatbots have been proposed as promising resources for oral health information. However, the quality and readability of existing online health-related information is often inconsistent and challenging.
This study aimed to compare the reliability and usefulness of dental implantology-related information provided by the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models (LLMs).
A total of 75 questions were developed covering various dental implant domains. These questions were then presented to 3 different LLMs: ChatGPT-3.5, ChatGPT-4, and Google Gemini. The responses generated were recorded and independently assessed by 2 specialists who were blinded to the source of the responses. The evaluation focused on the accuracy of the generated answers using a modified 5-point Likert scale to measure the reliability and usefulness of the information provided. Additionally, the ability of the AI-chatbots to offer definitive responses to closed questions, provide reference citation, and advise scheduling consultations with a dental specialist was also analyzed. The Friedman, Mann Whitney U and Spearman Correlation tests were used for data analysis (α=.05).
Google Gemini exhibited higher reliability and usefulness scores compared with ChatGPT-3.5 and ChatGPT-4 (P<.001). Google Gemini also demonstrated superior proficiency in identifying closed questions (25 questions, 41%) and recommended specialist consultations for 74 questions (98.7%), significantly outperforming ChatGPT-4 (30 questions, 40.0%) and ChatGPT-3.5 (28 questions, 37.3%) (P<.001). A positive correlation was found between reliability and usefulness scores, with Google Gemini showing the strongest correlation (ρ=.702).
The 3 AI Chatbots showed acceptable levels of reliability and usefulness in addressing dental implant-related queries. Google Gemini distinguished itself by providing responses consistent with specialist consultations.
人工智能(AI)聊天机器人已被视为获取口腔健康信息的有前景的资源。然而,现有的在线健康相关信息的质量和可读性往往不一致且颇具挑战性。
本研究旨在比较ChatGPT-3.5、ChatGPT-4和谷歌Gemini大语言模型(LLMs)提供的牙种植学相关信息的可靠性和实用性。
共提出75个涵盖牙种植各个领域的问题。然后将这些问题呈现给3个不同的大语言模型:ChatGPT-3.5、ChatGPT-4和谷歌Gemini。记录生成的回答,并由2名对回答来源不知情的专家进行独立评估。评估重点是使用改良的5点李克特量表来衡量所提供信息的可靠性和实用性,以评估生成答案的准确性。此外,还分析了人工智能聊天机器人对封闭式问题给出明确回答、提供参考文献引用以及建议安排与牙科专家会诊的能力。使用Friedman检验、Mann Whitney U检验和Spearman相关性检验进行数据分析(α = 0.05)。
与ChatGPT-3.5和ChatGPT-4相比,谷歌Gemini表现出更高的可靠性和实用性得分(P < 0.001)。谷歌Gemini在识别封闭式问题(25个问题,41%)方面也表现出更高的熟练度,并针对74个问题(98.7%)推荐了专家会诊,显著优于ChatGPT-4(30个问题,40.0%)和ChatGPT-3.5(28个问题,37.3%)(P < 0.001)。可靠性和实用性得分之间存在正相关,谷歌Gemini的相关性最强(ρ = 0.702)。
这3个人工智能聊天机器人在回答牙种植相关问题时显示出可接受的可靠性和实用性水平。谷歌Gemini通过提供与专家会诊一致的回答脱颖而出。