Hassan Mohamed G, Abdelaziz Ahmed A, Abdelrahman Hams H, Mohamed Mostafa M Y, Ellabban Mohamed T
Division of Bone and Mineral Diseases, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA.
Department of Orthodontics, Faculty of Dentistry, Assiut University, Assiut, Egypt.
Orthod Craniofac Res. 2025 May 7. doi: 10.1111/ocr.12939.
TMDs are a common group of conditions affecting the temporomandibular joint (TMJ) often resulting from factors like injury, stress or teeth grinding. This study aimed to evaluate the accuracy, completeness, reliability and readability of the responses generated by ChatGPT-3.5, -4o and Google Gemini to TMD-related inquiries. Forty-five questions covering various aspects of TMDs were created by two experts and submitted by one author to ChatGPT-3.5, ChatGPT-4 and Google Gemini on the same day. The responses were evaluated for accuracy, completeness and reliability using modified Likert scales. Readability was analysed with six validated indices via a specialised tool. Additional features, such as the inclusion of graphical elements, references and safeguard mechanisms, were also documented and analysed. The Pearson Chi-Square and One-Way ANOVA tests were used for data analysis. Google Gemini achieved the highest accuracy, providing 100% correct responses, followed by ChatGPT-3.5 (95.6%) and ChatGPT-4o (93.3%). ChatGPT-4o provided the most complete responses (91.1%), followed by ChatGPT-03 (64.4%) and Google Gemini (42.2%). The majority of responses were reliable, with ChatGPT-4o at 93.3% 'Absolutely Reliable', compared to 46.7% for ChatGPT-3.5 and 48.9% for Google Gemini. Both ChatGPT-4o and Google Gemini included references in responses, 22.2% and 13.3%, respectively, while ChatGPT-3.5 included none. Google Gemini was the only model that included multimedia (6.7%). Readability scores were highest for ChatGPT-3.5, suggesting its responses were more complex than those of Google Gemini and ChatGPT-4o. Both ChatGPT-4o and Google Gemini demonstrated accuracy and reliability in addressing TMD-related questions, with their responses being clear, easy to understand and complemented by safeguard statements encouraging specialist consultation. However, both platforms lacked evidence-based references. Only Google Gemini incorporated multimedia elements into its answers.
颞下颌关节紊乱病(TMDs)是一组常见的影响颞下颌关节(TMJ)的病症,通常由损伤、压力或磨牙等因素引起。本研究旨在评估ChatGPT-3.5、ChatGPT-4和谷歌Gemini对TMD相关询问所生成回答的准确性、完整性、可靠性和可读性。两位专家创建了涵盖TMD各个方面的45个问题,并由一位作者在同一天提交给ChatGPT-3.5、ChatGPT-4和谷歌Gemini。使用修改后的李克特量表对回答的准确性、完整性和可靠性进行评估。通过一个专门工具用六个经过验证的指标分析可读性。还记录并分析了其他特征,如是否包含图形元素、参考文献和保障机制。数据分析使用了Pearson卡方检验和单因素方差分析。谷歌Gemini的准确性最高,提供了100%的正确回答,其次是ChatGPT-3.5(95.6%)和ChatGPT-4(93.3%)。ChatGPT-4提供了最完整的回答(91.1%),其次是ChatGPT-03(64.4%)和谷歌Gemini(42.2%)。大多数回答是可靠的,ChatGPT-4为93.3%“绝对可靠”,而ChatGPT-3.5为46.7%,谷歌Gemini为48.9%。ChatGPT-4和谷歌Gemini在回答中都包含参考文献,分别为22.2%和13.3%,而ChatGPT-3.5则没有。谷歌Gemini是唯一包含多媒体的模型(6.7%)。ChatGPT-3.5的可读性得分最高,表明其回答比谷歌Gemini和ChatGPT-4的回答更复杂。ChatGPT-4和谷歌Gemini在回答TMD相关问题时都表现出准确性和可靠性,其回答清晰易懂,并辅以鼓励咨询专家的保障声明。然而,两个平台都缺乏基于证据的参考文献。只有谷歌Gemini在其回答中纳入了多媒体元素。