人工智能聊天机器人提供的关于口腔外颌面修复体信息的评估

Evaluation of information provided by artificial intelligence chatbots on extraoral maxillofacial prostheses.

作者信息

Özyemişci Nuran, Bal Bilge Turhan, Güngör Merve Bankoğlu, Öztürk Esra Kaynak, Canvar Ayşegül, Nemli Secil Karakoca

机构信息

Associate Professor, Dental Prosthesis Technology, Vocational School of Health Services, Hacettepe University, Ankara, Turkey.

Professor, Department of Prosthodontics, Faculty of Dentistry, Gazi University, Ankara, Turkey.

出版信息

J Prosthet Dent. 2025 Sep 8. doi: 10.1016/j.prosdent.2025.08.028.

DOI:10.1016/j.prosdent.2025.08.028

PMID:40925817

Abstract

STATEMENT OF PROBLEM

Despite advances in artificial intelligence (AI), the quality, reliability, and understandability of health-related information provided by chatbots is still a question mark. Furthermore, studies on maxillofacial prosthesis (MP) information from AI chatbots are lacking.

PURPOSE

The purpose of this study was to assess and compare the reliability, quality, readability, and similarity of responses to MP-related questions generated by 4 different chatbots.

MATERIAL AND METHODS

A total of 15 questions were provided by a maxillofacial prosthodontist and from 4 different chatbots (ChatGPT-3.5, Gemini 2.5 Flash, Copilot, and DeepSeek V3). The Reliability Scoring (adapted DISCERN), the Global Quality Scale (GQS), the Flesch Reading Ease Score (FRES), the Flesch-Kincaid Reading Grade Level (FKRGL), and the Similarity Index (iThenticate) were used to evaluate the performance of chatbots. Data were compared using the Kruskal-Wallis test, and the differences between chatbots were determined by the Conover multiple comparison test with Benjamini-Hochberg correction (α=.05).

RESULTS

There were no significant differences between the chatbots' DISCERN scores, except for one question where ChatGPT showed significantly higher reliability than Gemini or Copilot (P=.03). There was no statistically significant difference among AI tools in terms of GQS values (P=.096), FRES values (P=.166), and FKRGL values (P=.247). The similarity rate of Gemini was statistically higher than other AI chatbots (P=.03).

CONCLUSIONS

ChatGPT-3.5, Gemini 2.5 Flash, Copilot, and DeepSeek V3 showed good quality responses. All chatbots' responses were difficult for non-professionals to read and understand. Low similarity rates were found for all chatbots except Gemini, indicating originality of their information.

摘要

问题陈述

尽管人工智能（AI）取得了进展，但聊天机器人提供的与健康相关信息的质量、可靠性和可理解性仍是个问号。此外，缺乏对来自人工智能聊天机器人的颌面修复体（MP）信息的研究。

目的

本研究的目的是评估和比较4种不同聊天机器人对MP相关问题的回答的可靠性、质量、可读性和相似度。

材料与方法

一位口腔颌面修复医生提供了总共15个问题，并由4种不同的聊天机器人（ChatGPT-3.5、Gemini 2.5 Flash、Copilot和DeepSeek V3）进行回答。使用可靠性评分（改编的DISCERN）、全球质量量表（GQS）、弗莱什阅读简易度评分（FRES）、弗莱什-金凯德阅读年级水平（FKRGL）和相似度指数（iThenticate）来评估聊天机器人的性能。使用Kruskal-Wallis检验比较数据，并通过带有Benjamini-Hochberg校正的Conover多重比较检验确定聊天机器人之间的差异（α = 0.05）。

结果

除了一个问题ChatGPT的可靠性显著高于Gemini或Copilot（P = 0.03）外，聊天机器人的DISCERN评分之间没有显著差异。在GQS值（P = 0.096）、FRES值（P = 0.166）和FKRGL值（P = 0.247）方面，人工智能工具之间没有统计学上的显著差异。Gemini的相似度率在统计学上高于其他人工智能聊天机器人（P = 0.03）。

结论

ChatGPT-3.5、Gemini 2.5 Flash、Copilot和DeepSeek V3显示出质量良好的回答。所有聊天机器人的回答对于非专业人士来说都难以阅读和理解。除Gemini外，所有聊天机器人的相似度率都很低，表明其信息具有原创性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

人工智能聊天机器人提供的关于口腔外颌面修复体信息的评估

Evaluation of information provided by artificial intelligence chatbots on extraoral maxillofacial prostheses.

作者信息

机构信息

出版信息

STATEMENT OF PROBLEM

PURPOSE

MATERIAL AND METHODS

RESULTS

CONCLUSIONS

问题陈述

目的

材料与方法

结果

结论

相似文献

人工智能聊天机器人提供的关于口腔外颌面修复体信息的评估

Evaluation of information provided by artificial intelligence chatbots on extraoral maxillofacial prostheses.

作者信息

机构信息

出版信息

STATEMENT OF PROBLEM

PURPOSE

MATERIAL AND METHODS

RESULTS

CONCLUSIONS

问题陈述

目的

材料与方法

结果

结论

相似文献