• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探索ChatGPT-4、必应人工智能和Gemini作为虚拟顾问在向家庭普及早产儿视网膜病变知识方面的作用。

Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity.

作者信息

Durmaz Engin Ceren, Karatas Ezgi, Ozturk Taylan

机构信息

Department of Ophthalmology, Izmir Democracy University, Buca Seyfi Demirsoy Education and Research Hospital, Izmir 35390, Turkey.

Department of Biomedical Technologies, Faculty of Engineering, Dokuz Eylul University, Izmir 35390, Turkey.

出版信息

Children (Basel). 2024 Jun 20;11(6):750. doi: 10.3390/children11060750.

DOI:10.3390/children11060750
PMID:38929329
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11202218/
Abstract

BACKGROUND

Large language models (LLMs) are becoming increasingly important as they are being used more frequently for providing medical information. Our aim is to evaluate the effectiveness of electronic artificial intelligence (AI) large language models (LLMs), such as ChatGPT-4, BingAI, and Gemini in responding to patient inquiries about retinopathy of prematurity (ROP).

METHODS

The answers of LLMs for fifty real-life patient inquiries were assessed using a 5-point Likert scale by three ophthalmologists. The models' responses were also evaluated for reliability with the DISCERN instrument and the EQIP framework, and for readability using the Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), and Coleman-Liau Index.

RESULTS

ChatGPT-4 outperformed BingAI and Gemini, scoring the highest with 5 points in 90% (45 out of 50) and achieving ratings of "agreed" or "strongly agreed" in 98% (49 out of 50) of responses. It led in accuracy and reliability with DISCERN and EQIP scores of 63 and 72.2, respectively. BingAI followed with scores of 53 and 61.1, while Gemini was noted for the best readability (FRE score of 39.1) but lower reliability scores. Statistically significant performance differences were observed particularly in the screening, diagnosis, and treatment categories.

CONCLUSION

ChatGPT-4 excelled in providing detailed and reliable responses to ROP-related queries, although its texts were more complex. All models delivered generally accurate information as per DISCERN and EQIP assessments.

摘要

背景

随着大语言模型(LLMs)越来越频繁地用于提供医学信息,其重要性日益凸显。我们的目的是评估电子人工智能(AI)大语言模型,如ChatGPT-4、必应AI和Gemini,在回答患者关于早产儿视网膜病变(ROP)的询问时的有效性。

方法

由三位眼科医生使用5点李克特量表对大语言模型针对五十个实际患者询问的回答进行评估。还使用DISCERN工具和EQIP框架评估模型回答的可靠性,并使用弗莱什易读性(FRE)、弗莱什-金凯德年级水平(FKGL)和科尔曼-廖指数评估其可读性。

结果

ChatGPT-4的表现优于必应AI和Gemini,在90%(50个中的45个)的回答中获得5分的最高分,在98%(50个中的49个)的回答中获得“同意”或“强烈同意”的评级。它在准确性和可靠性方面领先,DISCERN和EQIP分数分别为63和72.2。必应AI其次,分数为53和61.1,而Gemini的可读性最佳(FRE分数为39.1),但可靠性分数较低。在筛查、诊断和治疗类别中观察到了具有统计学意义的性能差异。

结论

ChatGPT-4在提供与ROP相关问题的详细且可靠的回答方面表现出色,尽管其文本更为复杂。根据DISCERN和EQIP评估,所有模型提供的信息总体上都是准确的。

相似文献

1
Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity.探索ChatGPT-4、必应人工智能和Gemini作为虚拟顾问在向家庭普及早产儿视网膜病变知识方面的作用。
Children (Basel). 2024 Jun 20;11(6):750. doi: 10.3390/children11060750.
2
Comparison of large language models in management advice for melanoma: Google's AI BARD, BingAI and ChatGPT.大语言模型在黑色素瘤管理建议方面的比较:谷歌的人工智能BARD、必应人工智能和ChatGPT。
Skin Health Dis. 2023 Nov 28;4(1):e313. doi: 10.1002/ski2.313. eCollection 2024 Feb.
3
Investigating the impact of innovative AI chatbot on post-pandemic medical education and clinical assistance: a comprehensive analysis.探讨创新型人工智能聊天机器人对后疫情时代医学教育和临床辅助的影响:全面分析。
ANZ J Surg. 2024 Feb;94(1-2):68-77. doi: 10.1111/ans.18666. Epub 2023 Aug 21.
4
Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的可读性:一项观察性横断面研究。
Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.
5
Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini.大型语言模型在整形手术中的术中决策支持:ChatGPT-4 和 Gemini 的比较。
Medicina (Kaunas). 2024 Jun 8;60(6):957. doi: 10.3390/medicina60060957.
6
Evaluating Artificial Intelligence's Role in Teaching the Reporting and Interpretation of Computed Tomographic Angiography for Preoperative Planning of the Deep Inferior Epigastric Artery Perforator Flap.评估人工智能在教学腹下深动脉穿支皮瓣术前规划的计算机断层血管造影报告及解读中的作用。
JPRAS Open. 2024 Apr 5;40:273-285. doi: 10.1016/j.jpra.2024.03.010. eCollection 2024 Jun.
7
Can artificial intelligence models serve as patient information consultants in orthodontics?人工智能模型能否在正畸学中充当患者信息顾问?
BMC Med Inform Decis Mak. 2024 Jul 29;24(1):211. doi: 10.1186/s12911-024-02619-8.
8
Can AI Answer My Questions? Utilizing Artificial Intelligence in the Perioperative Assessment for Abdominoplasty Patients.人工智能能回答我的问题吗?腹部整形手术患者围手术期评估中人工智能的应用。
Aesthetic Plast Surg. 2024 Nov;48(22):4712-4724. doi: 10.1007/s00266-024-04157-0. Epub 2024 Jun 19.
9
Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy.评估大语言模型(ChatGPT-4、Gemini和Microsoft Copilot)对乳腺成像常见问题的回答:可读性和准确性研究
Cureus. 2024 May 9;16(5):e59960. doi: 10.7759/cureus.59960. eCollection 2024 May.
10
Evaluación de la fiabilidad y legibilidad de las respuestas de los chatbots como recurso de información al paciente para las exploraciones PET-TC más communes.评估聊天机器人回复作为常见PET-CT检查患者信息资源的可靠性和可读性。
Rev Esp Med Nucl Imagen Mol (Engl Ed). 2025 Jan-Feb;44(1):500065. doi: 10.1016/j.remnie.2024.500065. Epub 2024 Sep 28.

引用本文的文献

1
ChatGPT-4o and OpenAI-o1: A Comparative Analysis of Its Accuracy in Refractive Surgery.ChatGPT-4o与OpenAI-o1:屈光手术中其准确性的比较分析。
J Clin Med. 2025 Jul 22;14(15):5175. doi: 10.3390/jcm14155175.
2
Development and evaluation of an agentic LLM based RAG framework for evidence-based patient education.基于具身语言模型的检索增强生成框架用于循证患者教育的开发与评估
BMJ Health Care Inform. 2025 Jul 25;32(1):e101570. doi: 10.1136/bmjhci-2025-101570.
3
Accuracy of ChatGPT, Gemini, Copilot, and Claude to Blepharoplasty-Related Questions.ChatGPT、Gemini、Copilot和Claude对双眼皮手术相关问题的回答准确性。
Aesthetic Plast Surg. 2025 Jul 21. doi: 10.1007/s00266-025-05071-9.
4
Evaluating the Reliability and Quality of Sarcoidosis-Related Information Provided by AI Chatbots.评估人工智能聊天机器人提供的结节病相关信息的可靠性和质量。
Healthcare (Basel). 2025 Jun 5;13(11):1344. doi: 10.3390/healthcare13111344.
5
Competencies of Large Language Models About Piriformis Syndrome: Quality, Accuracy, Completeness, and Readability Study.大语言模型关于梨状肌综合征的能力:质量、准确性、完整性和可读性研究。
HSS J. 2025 May 20:15563316251340697. doi: 10.1177/15563316251340697.
6
Evaluating the Use of Generative Artificial Intelligence to Support Genetic Counseling for Rare Diseases.评估生成式人工智能在支持罕见病遗传咨询中的应用。
Diagnostics (Basel). 2025 Mar 10;15(6):672. doi: 10.3390/diagnostics15060672.
7
Accuracy of LLMs in medical education: evidence from a concordance test with medical teacher.大语言模型在医学教育中的准确性:来自与医学教师一致性测试的证据。
BMC Med Educ. 2025 Mar 26;25(1):443. doi: 10.1186/s12909-025-07009-w.
8
Opportunities and Challenges of Chatbots in Ophthalmology: A Narrative Review.眼科领域中聊天机器人的机遇与挑战:一篇叙述性综述
J Pers Med. 2024 Dec 21;14(12):1165. doi: 10.3390/jpm14121165.
9
Evaluation of Responses to Questions About Keratoconus Using ChatGPT-4.0, Google Gemini and Microsoft Copilot: A Comparative Study of Large Language Models on Keratoconus.使用ChatGPT-4.0、谷歌Gemini和微软Copilot评估圆锥角膜相关问题的回答:大型语言模型在圆锥角膜方面的比较研究
Eye Contact Lens. 2025 Mar 1;51(3):e107-e111. doi: 10.1097/ICL.0000000000001158. Epub 2024 Dec 4.
10
An Observational Study to Evaluate Readability and Reliability of AI-Generated Brochures for Emergency Medical Conditions.一项评估人工智能生成的急诊医疗状况手册可读性和可靠性的观察性研究。
Cureus. 2024 Aug 31;16(8):e68307. doi: 10.7759/cureus.68307. eCollection 2024 Aug.

本文引用的文献

1
Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations.利用人工智能在减重手术中的应用:ChatGPT-4、Bing 和 Bard 在生成临床医生水平的减重手术建议方面的比较分析。
Surg Obes Relat Dis. 2024 Jul;20(7):603-608. doi: 10.1016/j.soard.2024.03.011. Epub 2024 Mar 24.
2
Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing.生成式人工智能大语言模型在正畸学中的循证潜力:ChatGPT、谷歌巴德和微软必应的比较研究
Eur J Orthod. 2024 Apr 13. doi: 10.1093/ejo/cjae017.
3
A Comparative Study of Large Language Models, Human Experts, and Expert-Edited Large Language Models to Neuro-Ophthalmology Questions.大语言模型、人类专家以及经过专家编辑的大语言模型在神经眼科问题上的比较研究
J Neuroophthalmol. 2025 Mar 1;45(1):71-77. doi: 10.1097/WNO.0000000000002145. Epub 2024 Apr 2.
4
Chatbot ChatGPT-4 and Frequently Asked Questions About Amblyopia and Childhood Myopia.聊天机器人ChatGPT-4以及关于弱视和儿童近视的常见问题。
J Pediatr Ophthalmol Strabismus. 2024 Mar-Apr;61(2):151. doi: 10.3928/01913913-20240124-01. Epub 2024 Mar 1.
5
Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery.谷歌医生与ChatGPT医生:通过比较关于白内障及白内障手术的常见患者问题的回答的准确性、安全性和可读性,探索人工智能在眼科领域的应用。
Semin Ophthalmol. 2024 Aug;39(6):472-479. doi: 10.1080/08820538.2024.2326058. Epub 2024 Mar 22.
6
Artificial intelligence insights into osteoporosis: assessing ChatGPT's information quality and readability.人工智能在骨质疏松症中的应用:评估 ChatGPT 的信息质量和可读性。
Arch Osteoporos. 2024 Mar 19;19(1):17. doi: 10.1007/s11657-024-01376-5.
7
Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis.谷歌博士对 ChatGPT 博士:评估人工智能生成的关于阑尾炎的医学信息的内容和质量。
Surg Endosc. 2024 May;38(5):2887-2893. doi: 10.1007/s00464-024-10739-5. Epub 2024 Mar 5.
8
Large Language Models in Medicine: The Potentials and Pitfalls : A Narrative Review.医学领域的大型语言模型:潜力与陷阱:一篇叙事性综述。
Ann Intern Med. 2024 Feb;177(2):210-220. doi: 10.7326/M23-2772. Epub 2024 Jan 30.
9
How ChatGPT works: a mini review.ChatGPT的工作原理:一篇简短综述。
Eur Arch Otorhinolaryngol. 2024 Mar;281(3):1565-1569. doi: 10.1007/s00405-023-08337-7. Epub 2023 Nov 22.
10
Optimizing Ophthalmology Patient Education via ChatBot-Generated Materials: Readability Analysis of AI-Generated Patient Education Materials and The American Society of Ophthalmic Plastic and Reconstructive Surgery Patient Brochures.通过聊天机器人生成的材料优化眼科患者教育:人工智能生成的患者教育材料和美国眼科整形重建外科学会患者手册的可读性分析。
Ophthalmic Plast Reconstr Surg. 2024;40(2):212-216. doi: 10.1097/IOP.0000000000002549. Epub 2023 Nov 16.