Suppr超能文献

探索ChatGPT-4、必应人工智能和Gemini作为虚拟顾问在向家庭普及早产儿视网膜病变知识方面的作用。

Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity.

作者信息

Durmaz Engin Ceren, Karatas Ezgi, Ozturk Taylan

机构信息

Department of Ophthalmology, Izmir Democracy University, Buca Seyfi Demirsoy Education and Research Hospital, Izmir 35390, Turkey.

Department of Biomedical Technologies, Faculty of Engineering, Dokuz Eylul University, Izmir 35390, Turkey.

出版信息

Children (Basel). 2024 Jun 20;11(6):750. doi: 10.3390/children11060750.

Abstract

BACKGROUND

Large language models (LLMs) are becoming increasingly important as they are being used more frequently for providing medical information. Our aim is to evaluate the effectiveness of electronic artificial intelligence (AI) large language models (LLMs), such as ChatGPT-4, BingAI, and Gemini in responding to patient inquiries about retinopathy of prematurity (ROP).

METHODS

The answers of LLMs for fifty real-life patient inquiries were assessed using a 5-point Likert scale by three ophthalmologists. The models' responses were also evaluated for reliability with the DISCERN instrument and the EQIP framework, and for readability using the Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), and Coleman-Liau Index.

RESULTS

ChatGPT-4 outperformed BingAI and Gemini, scoring the highest with 5 points in 90% (45 out of 50) and achieving ratings of "agreed" or "strongly agreed" in 98% (49 out of 50) of responses. It led in accuracy and reliability with DISCERN and EQIP scores of 63 and 72.2, respectively. BingAI followed with scores of 53 and 61.1, while Gemini was noted for the best readability (FRE score of 39.1) but lower reliability scores. Statistically significant performance differences were observed particularly in the screening, diagnosis, and treatment categories.

CONCLUSION

ChatGPT-4 excelled in providing detailed and reliable responses to ROP-related queries, although its texts were more complex. All models delivered generally accurate information as per DISCERN and EQIP assessments.

摘要

背景

随着大语言模型(LLMs)越来越频繁地用于提供医学信息,其重要性日益凸显。我们的目的是评估电子人工智能(AI)大语言模型,如ChatGPT-4、必应AI和Gemini,在回答患者关于早产儿视网膜病变(ROP)的询问时的有效性。

方法

由三位眼科医生使用5点李克特量表对大语言模型针对五十个实际患者询问的回答进行评估。还使用DISCERN工具和EQIP框架评估模型回答的可靠性,并使用弗莱什易读性(FRE)、弗莱什-金凯德年级水平(FKGL)和科尔曼-廖指数评估其可读性。

结果

ChatGPT-4的表现优于必应AI和Gemini,在90%(50个中的45个)的回答中获得5分的最高分,在98%(50个中的49个)的回答中获得“同意”或“强烈同意”的评级。它在准确性和可靠性方面领先,DISCERN和EQIP分数分别为63和72.2。必应AI其次,分数为53和61.1,而Gemini的可读性最佳(FRE分数为39.1),但可靠性分数较低。在筛查、诊断和治疗类别中观察到了具有统计学意义的性能差异。

结论

ChatGPT-4在提供与ROP相关问题的详细且可靠的回答方面表现出色,尽管其文本更为复杂。根据DISCERN和EQIP评估,所有模型提供的信息总体上都是准确的。

相似文献

引用本文的文献

本文引用的文献

4
Chatbot ChatGPT-4 and Frequently Asked Questions About Amblyopia and Childhood Myopia.聊天机器人ChatGPT-4以及关于弱视和儿童近视的常见问题。
J Pediatr Ophthalmol Strabismus. 2024 Mar-Apr;61(2):151. doi: 10.3928/01913913-20240124-01. Epub 2024 Mar 1.
9
How ChatGPT works: a mini review.ChatGPT的工作原理:一篇简短综述。
Eur Arch Otorhinolaryngol. 2024 Mar;281(3):1565-1569. doi: 10.1007/s00405-023-08337-7. Epub 2023 Nov 22.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验