Suppr超能文献

评估不同大语言模型在口腔修复学中的准确性、可靠性、一致性和可读性。

Evaluating the Accuracy, Reliability, Consistency, and Readability of Different Large Language Models in Restorative Dentistry.

作者信息

Ozdemir Zeyneb Merve, Yapici Emre

机构信息

Department of Restorative Dentistry, Faculty of Dentistry, Kahramanmaras Sutcu Imam University, Kahramanmaras, Turkey.

出版信息

J Esthet Restor Dent. 2025 Jul;37(7):1740-1752. doi: 10.1111/jerd.13447. Epub 2025 Mar 2.

Abstract

OBJECTIVE

This study aimed to evaluate the reliability, consistency, and readability of responses provided by various artificial intelligence (AI) programs to questions related to Restorative Dentistry.

MATERIALS AND METHODS

Forty-five knowledge-based information and 20 questions (10 patient-related and 10 dentistry-specific) were posed to ChatGPT-3.5, ChatGPT-4, ChatGPT-4o, Chatsonic, Copilot, and Gemini Advanced chatbots. The DISCERN questionnaire was used to assess the reliability; Flesch Reading Ease and Flesch-Kincaid Grade Level scores were utilized to evaluate readability. Accuracy and consistency were determined based on the chatbots' responses to the knowledge-based questions.

RESULTS

ChatGPT-4, ChatGPT-4o, Chatsonic, and Copilot demonstrated "good" reliability, while ChatGPT-3.5 and Gemini Advanced showed "fair" reliability. Chatsonic exhibited the highest "DISCERN total score" for patient-related questions, while ChatGPT-4o performed best for dentistry-specific questions. No significant differences were found in readability among the chatbots (p > 0.05). ChatGPT-4o showed the highest accuracy (93.3%) for knowledge-based questions, while Copilot had the lowest (68.9%). ChatGPT-4 demonstrated the highest consistency between repetitions.

CONCLUSION

Performance of AIs varied in terms of accuracy, reliability, consistency, and readability when responding to Restorative Dentistry questions. ChatGPT-4o and Chatsonic showed promising results for academic and patient education applications. However, the readability of responses was generally above recommended levels for patient education materials.

CLINICAL SIGNIFICANCE

The utilization of AI has an increasing impact on various aspects of dentistry. Moreover, if the responses to patient-related and dentistry-specific questions in restorative dentistry prove to be reliable and comprehensible, this may yield promising outcomes for the future.

摘要

目的

本研究旨在评估各种人工智能(AI)程序对与修复牙科学相关问题的回答的可靠性、一致性和可读性。

材料与方法

向ChatGPT-3.5、ChatGPT-4、ChatGPT-4o、Chatsonic、Copilot和Gemini Advanced聊天机器人提出了45条基于知识的信息和20个问题(10个与患者相关的问题和10个牙科特定问题)。使用DISCERN问卷评估可靠性;利用弗莱什易读性和弗莱什-金凯德年级水平得分评估可读性。根据聊天机器人对基于知识的问题的回答确定准确性和一致性。

结果

ChatGPT-4、ChatGPT-4o、Chatsonic和Copilot表现出“良好”的可靠性,而ChatGPT-3.5和Gemini Advanced表现出“一般”的可靠性。Chatsonic在与患者相关问题上的“DISCERN总分”最高,而ChatGPT-4o在牙科特定问题上表现最佳。聊天机器人之间的可读性没有显著差异(p>0.05)。ChatGPT-4o在基于知识的问题上的准确性最高(93.3%),而Copilot最低(68.9%)。ChatGPT-4在重复回答之间表现出最高的一致性。

结论

在回答修复牙科学问题时,人工智能在准确性、可靠性、一致性和可读性方面表现各异。ChatGPT-4o和Chatsonic在学术和患者教育应用方面显示出有希望的结果。然而,回答的可读性总体上高于患者教育材料的推荐水平。

临床意义

人工智能的应用对牙科的各个方面的影响日益增加。此外,如果修复牙科学中与患者相关和牙科特定问题的回答被证明是可靠且易于理解的,这可能会为未来带来有希望的结果。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验