Suppr超能文献

大语言模型生成早产儿视网膜病变患者信息材料的能力:可读性、准确性和全面性评估

The Ability of Large Language Models to Generate Patient Information Materials for Retinopathy of Prematurity: Evaluation of Readability, Accuracy, and Comprehensiveness.

作者信息

Postacı Sevinç Arzu, Dal Ali

机构信息

Mustafa Kemal University, Tayfur Sökmen Faculty of Medicine, Department of Ophthalmology, Hatay, Türkiye.

出版信息

Turk J Ophthalmol. 2024 Dec 31;54(6):330-336. doi: 10.4274/tjo.galenos.2024.58295.

Abstract

OBJECTIVES

This study compared the readability of patient education materials from the Turkish Ophthalmological Association (TOA) retinopathy of prematurity (ROP) guidelines with those generated by large language models (LLMs). The ability of GPT-4.0, GPT-4o mini, and Gemini to produce patient education materials was evaluated in terms of accuracy and comprehensiveness.

MATERIALS AND METHODS

Thirty questions from the TOA ROP guidelines were posed to GPT-4.0, GPT-4o mini, and Gemini. Their responses were then reformulated using the prompts "Can you revise this text to be understandable at a 6-grade reading level?" (P1 format) and "Can you make this text easier to understand?" (P2 format). The readability of the TOA ROP guidelines and the LLM-generated responses was analyzed using the Ateşman and Bezirci-Yılmaz formulas. Additionally, ROP specialists evaluated the comprehensiveness and accuracy of the responses.

RESULTS

The TOA brochure was found to have a reading level above the 6-grade level recommended in the literature. Materials generated by GPT-4.0 and Gemini had significantly greater readability than the TOA brochure (p<0.05). Adjustments made in the P1 and P2 formats improved readability for GPT-4.0, while no significant change was observed for GPT-4o mini and Gemini. GPT-4.0 had the highest scores for accuracy and comprehensiveness, while Gemini had the lowest.

CONCLUSION

GPT-4.0 appeared to have greater potential for generating more readable, accurate, and comprehensive patient education materials. However, when integrating LLMs into the healthcare field, regional medical differences and the accuracy of the provided information must be carefully assessed.

摘要

目的

本研究比较了土耳其眼科学会(TOA)早产儿视网膜病变(ROP)指南中患者教育材料与大语言模型(LLM)生成的材料的可读性。从准确性和全面性方面评估了GPT-4.0、GPT-4o mini和Gemini生成患者教育材料的能力。

材料与方法

向GPT-4.0、GPT-4o mini和Gemini提出了30个来自TOA ROP指南的问题。然后使用提示语“你能将这段文本修改为六年级阅读水平可理解的内容吗?”(P1格式)和“你能让这段文本更容易理解吗?”(P2格式)对它们的回答进行重新表述。使用阿泰斯曼公式和贝齐尔吉-伊尔马兹公式分析了TOA ROP指南和LLM生成的回答的可读性。此外,ROP专家评估了回答的全面性和准确性。

结果

发现TOA手册的阅读水平高于文献中推荐的六年级水平。GPT-4.0和Gemini生成的材料的可读性明显高于TOA手册(p<0.05)。以P1和P2格式进行的调整提高了GPT-4.0的可读性,而GPT-4o mini和Gemini则未观察到显著变化。GPT-4.0在准确性和全面性方面得分最高,而Gemini得分最低。

结论

GPT-4.0在生成更具可读性、准确性和全面性的患者教育材料方面似乎具有更大潜力。然而,在将LLM整合到医疗保健领域时,必须仔细评估地区医疗差异和所提供信息的准确性。

相似文献

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验