人工智能聊天机器人对常见颞下颌关节紊乱病（TMDs）患者问题的回答：准确性、完整性、可靠性和可读性。

Performance of AI-Chatbots to Common Temporomandibular Joint Disorders (TMDs) Patient Queries: Accuracy, Completeness, Reliability and Readability.

作者信息

Hassan Mohamed G, Abdelaziz Ahmed A, Abdelrahman Hams H, Mohamed Mostafa M Y, Ellabban Mohamed T

机构信息

Division of Bone and Mineral Diseases, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA.

Department of Orthodontics, Faculty of Dentistry, Assiut University, Assiut, Egypt.

出版信息

Orthod Craniofac Res. 2025 May 7. doi: 10.1111/ocr.12939.

DOI:10.1111/ocr.12939

PMID:40332142

Abstract

TMDs are a common group of conditions affecting the temporomandibular joint (TMJ) often resulting from factors like injury, stress or teeth grinding. This study aimed to evaluate the accuracy, completeness, reliability and readability of the responses generated by ChatGPT-3.5, -4o and Google Gemini to TMD-related inquiries. Forty-five questions covering various aspects of TMDs were created by two experts and submitted by one author to ChatGPT-3.5, ChatGPT-4 and Google Gemini on the same day. The responses were evaluated for accuracy, completeness and reliability using modified Likert scales. Readability was analysed with six validated indices via a specialised tool. Additional features, such as the inclusion of graphical elements, references and safeguard mechanisms, were also documented and analysed. The Pearson Chi-Square and One-Way ANOVA tests were used for data analysis. Google Gemini achieved the highest accuracy, providing 100% correct responses, followed by ChatGPT-3.5 (95.6%) and ChatGPT-4o (93.3%). ChatGPT-4o provided the most complete responses (91.1%), followed by ChatGPT-03 (64.4%) and Google Gemini (42.2%). The majority of responses were reliable, with ChatGPT-4o at 93.3% 'Absolutely Reliable', compared to 46.7% for ChatGPT-3.5 and 48.9% for Google Gemini. Both ChatGPT-4o and Google Gemini included references in responses, 22.2% and 13.3%, respectively, while ChatGPT-3.5 included none. Google Gemini was the only model that included multimedia (6.7%). Readability scores were highest for ChatGPT-3.5, suggesting its responses were more complex than those of Google Gemini and ChatGPT-4o. Both ChatGPT-4o and Google Gemini demonstrated accuracy and reliability in addressing TMD-related questions, with their responses being clear, easy to understand and complemented by safeguard statements encouraging specialist consultation. However, both platforms lacked evidence-based references. Only Google Gemini incorporated multimedia elements into its answers.

摘要

颞下颌关节紊乱病（TMDs）是一组常见的影响颞下颌关节（TMJ）的病症，通常由损伤、压力或磨牙等因素引起。本研究旨在评估ChatGPT-3.5、ChatGPT-4和谷歌Gemini对TMD相关询问所生成回答的准确性、完整性、可靠性和可读性。两位专家创建了涵盖TMD各个方面的45个问题，并由一位作者在同一天提交给ChatGPT-3.5、ChatGPT-4和谷歌Gemini。使用修改后的李克特量表对回答的准确性、完整性和可靠性进行评估。通过一个专门工具用六个经过验证的指标分析可读性。还记录并分析了其他特征，如是否包含图形元素、参考文献和保障机制。数据分析使用了Pearson卡方检验和单因素方差分析。谷歌Gemini的准确性最高，提供了100%的正确回答，其次是ChatGPT-3.5（95.6%）和ChatGPT-4（93.3%）。ChatGPT-4提供了最完整的回答（91.1%），其次是ChatGPT-03（64.4%）和谷歌Gemini（42.2%）。大多数回答是可靠的，ChatGPT-4为93.3%“绝对可靠”，而ChatGPT-3.5为46.7%，谷歌Gemini为48.9%。ChatGPT-4和谷歌Gemini在回答中都包含参考文献，分别为22.2%和13.3%，而ChatGPT-3.5则没有。谷歌Gemini是唯一包含多媒体的模型（6.7%）。ChatGPT-3.5的可读性得分最高，表明其回答比谷歌Gemini和ChatGPT-4的回答更复杂。ChatGPT-4和谷歌Gemini在回答TMD相关问题时都表现出准确性和可靠性，其回答清晰易懂，并辅以鼓励咨询专家的保障声明。然而，两个平台都缺乏基于证据的参考文献。只有谷歌Gemini在其回答中纳入了多媒体元素。

相似文献

Performance of AI-Chatbots to Common Temporomandibular Joint Disorders (TMDs) Patient Queries: Accuracy, Completeness, Reliability and Readability.

Orthod Craniofac Res. 2025 May 7. doi: 10.1111/ocr.12939.

A Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-4o and Google Gemini in Answering Questions About Birth Control Methods.

Cureus. 2025 Jan 1;17(1):e76745. doi: 10.7759/cureus.76745. eCollection 2025 Jan.

The use of ChatGPT and Google Gemini in responding to orthognathic surgery-related questions: A comparative study.

J World Fed Orthod. 2025 Feb;14(1):20-26. doi: 10.1016/j.ejwf.2024.09.004. Epub 2024 Oct 28.

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.

Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.

Performance of the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models in responding to dental implantology inquiries.

J Prosthet Dent. 2025 Jan 4. doi: 10.1016/j.prosdent.2024.12.016.

Evaluating ChatGPT and Google Gemini Performance and Implications in Turkish Dental Education.

Cureus. 2025 Jan 11;17(1):e77292. doi: 10.7759/cureus.77292. eCollection 2025 Jan.

Evaluating the Accuracy, Reliability, Consistency, and Readability of Different Large Language Models in Restorative Dentistry.

J Esthet Restor Dent. 2025 Jul;37(7):1740-1752. doi: 10.1111/jerd.13447. Epub 2025 Mar 2.

Dr. Chatbot: Investigating the Quality and Quantity of Responses Generated by Three AI Chatbots to Prompts Regarding Carpal Tunnel Syndrome.

Cureus. 2025 Mar 24;17(3):e81068. doi: 10.7759/cureus.81068. eCollection 2025 Mar.

Evaluating the Efficacy of Artificial Intelligence-Driven Chatbots in Addressing Queries on Vernal Conjunctivitis.

Cureus. 2025 Feb 26;17(2):e79688. doi: 10.7759/cureus.79688. eCollection 2025 Feb.

Comparative analysis of ChatGPT-4o mini, ChatGPT-4o and Gemini Advanced in the treatment of postmenopausal osteoporosis.

BMC Musculoskelet Disord. 2025 Apr 16;26(1):369. doi: 10.1186/s12891-025-08601-3.

引用本文的文献

Diagnostic Performance of ChatGPT-4o in Analyzing Oral Mucosal Lesions: A Comparative Study with Experts.

Medicina (Kaunas). 2025 Jul 30;61(8):1379. doi: 10.3390/medicina61081379.

Reliability of Large Language Model-Based Chatbots Versus Clinicians as Sources of Information on Orthodontics: A Comparative Analysis.

Dent J (Basel). 2025 Jul 24;13(8):343. doi: 10.3390/dj13080343.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人工智能聊天机器人对常见颞下颌关节紊乱病（TMDs）患者问题的回答：准确性、完整性、可靠性和可读性。

Performance of AI-Chatbots to Common Temporomandibular Joint Disorders (TMDs) Patient Queries: Accuracy, Completeness, Reliability and Readability.

作者信息

Hassan Mohamed G, Abdelaziz Ahmed A, Abdelrahman Hams H, Mohamed Mostafa M Y, Ellabban Mohamed T

机构信息

Division of Bone and Mineral Diseases, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA.

Department of Orthodontics, Faculty of Dentistry, Assiut University, Assiut, Egypt.

出版信息

Orthod Craniofac Res. 2025 May 7. doi: 10.1111/ocr.12939.

DOI:10.1111/ocr.12939

PMID:40332142

Abstract

摘要

人工智能聊天机器人对常见颞下颌关节紊乱病（TMDs）患者问题的回答：准确性、完整性、可靠性和可读性。

Performance of AI-Chatbots to Common Temporomandibular Joint Disorders (TMDs) Patient Queries: Accuracy, Completeness, Reliability and Readability.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

人工智能聊天机器人对常见颞下颌关节紊乱病（TMDs）患者问题的回答：准确性、完整性、可靠性和可读性。

Performance of AI-Chatbots to Common Temporomandibular Joint Disorders (TMDs) Patient Queries: Accuracy, Completeness, Reliability and Readability.

作者信息

机构信息

出版信息

相似文献

引用本文的文献