• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

作为根管再治疗患者信息来源的人工智能聊天机器人回复的可读性、准确性、恰当性和质量:一项比较评估。

Readability, accuracy and appropriateness and quality of AI chatbot responses as a patient information source on root canal retreatment: A comparative assessment.

作者信息

Büker Mine, Mercan Gamze

机构信息

Department of Endodontics, Faculty of Dentistry, Mersin University, Mersin, Turkey.

出版信息

Int J Med Inform. 2025 Sep;201:105948. doi: 10.1016/j.ijmedinf.2025.105948. Epub 2025 Apr 25.

DOI:10.1016/j.ijmedinf.2025.105948
PMID:40288015
Abstract

AIM

This study aimed to assess the readability, accuracy, appropriateness, and overall quality of responses generated by large language models (LLMs), including ChatGPT-3.5, Microsoft Copilot, and Gemini (Version 2.0 Flash), to frequently asked questions (FAQs) related to root canal retreatment.

METHODS

Three LLM chatbots-ChatGPT-3.5, Microsoft Copilot, and Gemini (Version 2.0 Flash)-were assessed based on their responses to 10 patient FAQs. Readability was analyzed using seven indices, including Flesch reading ease score (FRES), Flesch-Kincaid grade level (FKGL), Simple Measure of Gobbledygook (SMOG), gunning FOG (GFOG), Linsear Write (LW), Coleman-Liau (CL), and automated readability index (ARI), and compared against the recommended sixth-grade reading level. Response quality was evaluated using the Global Quality Scale (GQS), while accuracy and appropriateness were rated on a five-point Likert scale by two independent reviewers. Statistical analyses were conducted using one-way ANOVA, Tukey or Games-Howell post-hoc tests for continuous variables. Spearman's correlation test was used to assess associations between categorical variables.

RESULTS

All chatbots generated responses exceeding the recommended readability level, making them suitable for readers at or above the 10th-grade level. No significant difference was found between ChatGPT-3.5 and Microsoft Copilot, while Gemini produced significantly more readable responses (p < 0.05). Gemini demonstrated the highest proportion of accurate (80 %) and high-quality responses (80 %) compared to ChatGPT-3.5 and Microsoft Copilot.

CONCLUSIONS

None of the chatbots met the recommended readability standards for patient education materials. While Gemini demonstrated better readability, accuracy, and quality, all three models require further optimization to enhance accessibility and reliability in patient communication.

摘要

目的

本研究旨在评估大型语言模型(LLM),包括ChatGPT-3.5、Microsoft Copilot和Gemini(版本2.0 Flash)对根管再治疗常见问题(FAQ)的回答的可读性、准确性、恰当性和整体质量。

方法

基于三个LLM聊天机器人——ChatGPT-3.5、Microsoft Copilot和Gemini(版本2.0 Flash)对10个患者常见问题的回答进行评估。使用七个指标分析可读性,包括弗莱什易读性分数(FRES)、弗莱什-金凯德年级水平(FKGL)、难词简易衡量法(SMOG)、冈宁雾度指数(GFOG)、林西厄书写指数(LW)、科尔曼-廖指数(CL)和自动可读性指数(ARI),并与推荐的六年级阅读水平进行比较。使用全球质量量表(GQS)评估回答质量,而准确性和恰当性由两名独立评审员根据五点李克特量表进行评分。对连续变量使用单因素方差分析、Tukey或Games-Howell事后检验进行统计分析。使用斯皮尔曼相关性检验评估分类变量之间的关联。

结果

所有聊天机器人生成的回答都超过了推荐的可读性水平,使其适合十年级及以上水平的读者。ChatGPT-3.5和Microsoft Copilot之间未发现显著差异,而Gemini生成的回答可读性明显更高(p < 0.05)。与ChatGPT-3.5和Microsoft Copilot相比,Gemini的准确回答比例(80%)和高质量回答比例(80%)最高。

结论

没有一个聊天机器人达到患者教育材料推荐的可读性标准。虽然Gemini表现出更好的可读性、准确性和质量,但所有三个模型都需要进一步优化,以提高患者沟通中的可及性和可靠性。

相似文献

1
Readability, accuracy and appropriateness and quality of AI chatbot responses as a patient information source on root canal retreatment: A comparative assessment.作为根管再治疗患者信息来源的人工智能聊天机器人回复的可读性、准确性、恰当性和质量:一项比较评估。
Int J Med Inform. 2025 Sep;201:105948. doi: 10.1016/j.ijmedinf.2025.105948. Epub 2025 Apr 25.
2
Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.评估人工智能聊天机器人提供的关于化疗心脏毒性的患者教育材料的质量和可读性:一项观察性横断面研究。
Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.
3
Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的可读性:一项观察性横断面研究。
Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.
4
Assessing the Responses of Large Language Models (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Retinopathy of Prematurity: A Study on Readability and Appropriateness.评估大型语言模型(ChatGPT-4、Claude 3、Gemini和Microsoft Copilot)对早产儿视网膜病变常见问题的回答:一项关于可读性和适宜性的研究
J Pediatr Ophthalmol Strabismus. 2025 Mar-Apr;62(2):84-95. doi: 10.3928/01913913-20240911-05. Epub 2024 Oct 28.
5
Readability and Appropriateness of Responses Generated by ChatGPT 3.5, ChatGPT 4.0, Gemini, and Microsoft Copilot for FAQs in Refractive Surgery.ChatGPT 3.5、ChatGPT 4.0、Gemini和Microsoft Copilot针对屈光手术常见问题生成的回复的可读性和恰当性。
Turk J Ophthalmol. 2024 Dec 31;54(6):313-317. doi: 10.4274/tjo.galenos.2024.28234.
6
Can artificial intelligence models serve as patient information consultants in orthodontics?人工智能模型能否在正畸学中充当患者信息顾问?
BMC Med Inform Decis Mak. 2024 Jul 29;24(1):211. doi: 10.1186/s12911-024-02619-8.
7
Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy.评估大语言模型(ChatGPT-4、Gemini和Microsoft Copilot)对乳腺成像常见问题的回答:可读性和准确性研究
Cureus. 2024 May 9;16(5):e59960. doi: 10.7759/cureus.59960. eCollection 2024 May.
8
Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.评估 ChatGPT®、BARD®、 Gemini®、Copilot®、Perplexity® 在姑息治疗方面的可读性、可靠性和质量。
Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.
9
Evaluating the Quality and Readability of Information Provided by Generative Artificial Intelligence Chatbots on Clavicle Fracture Treatment Options.评估生成式人工智能聊天机器人提供的关于锁骨骨折治疗方案信息的质量和可读性。
Cureus. 2025 Jan 9;17(1):e77200. doi: 10.7759/cureus.77200. eCollection 2025 Jan.
10
Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性:公众需谨慎。
Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.

引用本文的文献

1
Reliability of Large Language Model-Based Chatbots Versus Clinicians as Sources of Information on Orthodontics: A Comparative Analysis.基于大语言模型的聊天机器人与临床医生作为正畸学信息来源的可靠性:一项比较分析。
Dent J (Basel). 2025 Jul 24;13(8):343. doi: 10.3390/dj13080343.