• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估大型语言模型(ChatGPT-4、Claude 3、Gemini和Microsoft Copilot)对早产儿视网膜病变常见问题的回答:一项关于可读性和适宜性的研究

Assessing the Responses of Large Language Models (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Retinopathy of Prematurity: A Study on Readability and Appropriateness.

作者信息

Ermis Serhat, Özal Ece, Karapapak Murat, Kumantaş Ebrar, Özal Sadık Altan

出版信息

J Pediatr Ophthalmol Strabismus. 2025 Mar-Apr;62(2):84-95. doi: 10.3928/01913913-20240911-05. Epub 2024 Oct 28.

DOI:10.3928/01913913-20240911-05
PMID:39465590
Abstract

PURPOSE

To assess the appropriateness and readability of responses provided by four large language models (LLMs) (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to parents' queries pertaining to retinopathy of prematurity (ROP).

METHODS

A total of 60 frequently asked questions were collated and categorized into six distinct sections. The responses generated by the LLMs were evaluated by three experienced ROP specialists to determine their appropriateness and comprehensiveness. Additionally, the readability of the responses was assessed using a range of metrics, including the Flesch-Kincaid Grade Level (FKGL), Gunning Fog (GF) Index, Coleman-Liau (CL) Index, Simple Measure of Gobbledygook (SMOG) Index, and Flesch Reading Ease (FRE) score.

RESULTS

ChatGPT-4 demonstrated the highest level of appropriateness (100%) and performed exceptionally well in the Likert analysis, scoring 5 points on 96% of questions. The CL Index and FRE scores identified Gemini as the most readable LLM, whereas the GF Index and SMOG Index rated Microsoft Copilot as the most readable. Nevertheless, ChatGPT-4 exhibited the most intricate text structure, with scores of 18.56 on the GF Index, 18.56 on the CL Index, 17.2 on the SMOG Index, and 9.45 on the FRE score. This suggests that the responses demand a college-level comprehension.

CONCLUSIONS

ChatGPT-4 demonstrated higher performance than other LLMs in responding to questions related to ROP; however, its texts were more complex. In terms of readability, Gemini and Microsoft Copilot were found to be more successful. .

摘要

目的

评估四种大语言模型(LLMs)(ChatGPT-4、Claude 3、Gemini和Microsoft Copilot)对家长有关早产儿视网膜病变(ROP)问题的回答的恰当性和可读性。

方法

共整理了60个常见问题,并将其分为六个不同部分。由三位经验丰富的ROP专家对大语言模型生成的回答进行评估,以确定其恰当性和全面性。此外,使用一系列指标评估回答的可读性,包括弗莱施-金凯德年级水平(FKGL)、冈宁雾度(GF)指数、科尔曼-廖(CL)指数、简单晦涩度测量(SMOG)指数和弗莱施阅读简易度(FRE)得分。

结果

ChatGPT-4表现出最高的恰当性水平(100%),在李克特分析中表现出色,96%的问题得分为5分。CL指数和FRE得分表明Gemini是最易读的大语言模型,而GF指数和SMOG指数则将Microsoft Copilot评为最易读的。然而,ChatGPT-4展现出最复杂的文本结构,GF指数得分为18.56,CL指数得分为18.56,SMOG指数得分为17.2,FRE得分9.45。这表明回答需要大学水平的理解能力。

结论

ChatGPT-4在回答与ROP相关问题上比其他大语言模型表现更好;然而,其文本更复杂。在可读性方面,Gemini和Microsoft Copilot更成功。

相似文献

1
Assessing the Responses of Large Language Models (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Retinopathy of Prematurity: A Study on Readability and Appropriateness.评估大型语言模型(ChatGPT-4、Claude 3、Gemini和Microsoft Copilot)对早产儿视网膜病变常见问题的回答:一项关于可读性和适宜性的研究
J Pediatr Ophthalmol Strabismus. 2025 Mar-Apr;62(2):84-95. doi: 10.3928/01913913-20240911-05. Epub 2024 Oct 28.
2
Readability, accuracy and appropriateness and quality of AI chatbot responses as a patient information source on root canal retreatment: A comparative assessment.作为根管再治疗患者信息来源的人工智能聊天机器人回复的可读性、准确性、恰当性和质量:一项比较评估。
Int J Med Inform. 2025 Sep;201:105948. doi: 10.1016/j.ijmedinf.2025.105948. Epub 2025 Apr 25.
3
Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的可读性:一项观察性横断面研究。
Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.
4
Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.评估人工智能聊天机器人提供的关于化疗心脏毒性的患者教育材料的质量和可读性:一项观察性横断面研究。
Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.
5
Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy.评估大语言模型(ChatGPT-4、Gemini和Microsoft Copilot)对乳腺成像常见问题的回答:可读性和准确性研究
Cureus. 2024 May 9;16(5):e59960. doi: 10.7759/cureus.59960. eCollection 2024 May.
6
Evaluation of the reliability and readability of answers given by chatbots to frequently asked questions about endophthalmitis: A cross-sectional study on chatbots.评估聊天机器人对眼内炎常见问题回答的可靠性和可读性:一项关于聊天机器人的横断面研究。
Health Informatics J. 2024 Oct-Dec;30(4):14604582241304679. doi: 10.1177/14604582241304679.
7
Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity.探索ChatGPT-4、必应人工智能和Gemini作为虚拟顾问在向家庭普及早产儿视网膜病变知识方面的作用。
Children (Basel). 2024 Jun 20;11(6):750. doi: 10.3390/children11060750.
8
Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比:横断面试点研究
JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.
9
Investigating the role of large language models on questions about refractive surgery.探究大语言模型在屈光手术相关问题上的作用。
Int J Med Inform. 2025 Mar;195:105787. doi: 10.1016/j.ijmedinf.2025.105787. Epub 2025 Jan 6.
10
Readability and Appropriateness of Responses Generated by ChatGPT 3.5, ChatGPT 4.0, Gemini, and Microsoft Copilot for FAQs in Refractive Surgery.ChatGPT 3.5、ChatGPT 4.0、Gemini和Microsoft Copilot针对屈光手术常见问题生成的回复的可读性和恰当性。
Turk J Ophthalmol. 2024 Dec 31;54(6):313-317. doi: 10.4274/tjo.galenos.2024.28234.

引用本文的文献

1
Comparative analysis of language models in addressing syphilis-related queries.语言模型在处理梅毒相关查询方面的比较分析。
Med Oral Patol Oral Cir Bucal. 2025 Jul 1;30(4):e551-e560. doi: 10.4317/medoral.27092.
2
Evaluating the Accuracy, Reliability, Consistency, and Readability of Different Large Language Models in Restorative Dentistry.评估不同大语言模型在口腔修复学中的准确性、可靠性、一致性和可读性。
J Esthet Restor Dent. 2025 Jul;37(7):1740-1752. doi: 10.1111/jerd.13447. Epub 2025 Mar 2.