Postacı Sevinç Arzu, Dal Ali
Mustafa Kemal University, Tayfur Sökmen Faculty of Medicine, Department of Ophthalmology, Hatay, Türkiye.
Turk J Ophthalmol. 2024 Dec 31;54(6):330-336. doi: 10.4274/tjo.galenos.2024.58295.
This study compared the readability of patient education materials from the Turkish Ophthalmological Association (TOA) retinopathy of prematurity (ROP) guidelines with those generated by large language models (LLMs). The ability of GPT-4.0, GPT-4o mini, and Gemini to produce patient education materials was evaluated in terms of accuracy and comprehensiveness.
Thirty questions from the TOA ROP guidelines were posed to GPT-4.0, GPT-4o mini, and Gemini. Their responses were then reformulated using the prompts "Can you revise this text to be understandable at a 6-grade reading level?" (P1 format) and "Can you make this text easier to understand?" (P2 format). The readability of the TOA ROP guidelines and the LLM-generated responses was analyzed using the Ateşman and Bezirci-Yılmaz formulas. Additionally, ROP specialists evaluated the comprehensiveness and accuracy of the responses.
The TOA brochure was found to have a reading level above the 6-grade level recommended in the literature. Materials generated by GPT-4.0 and Gemini had significantly greater readability than the TOA brochure (p<0.05). Adjustments made in the P1 and P2 formats improved readability for GPT-4.0, while no significant change was observed for GPT-4o mini and Gemini. GPT-4.0 had the highest scores for accuracy and comprehensiveness, while Gemini had the lowest.
GPT-4.0 appeared to have greater potential for generating more readable, accurate, and comprehensive patient education materials. However, when integrating LLMs into the healthcare field, regional medical differences and the accuracy of the provided information must be carefully assessed.
本研究比较了土耳其眼科学会(TOA)早产儿视网膜病变(ROP)指南中患者教育材料与大语言模型(LLM)生成的材料的可读性。从准确性和全面性方面评估了GPT-4.0、GPT-4o mini和Gemini生成患者教育材料的能力。
向GPT-4.0、GPT-4o mini和Gemini提出了30个来自TOA ROP指南的问题。然后使用提示语“你能将这段文本修改为六年级阅读水平可理解的内容吗?”(P1格式)和“你能让这段文本更容易理解吗?”(P2格式)对它们的回答进行重新表述。使用阿泰斯曼公式和贝齐尔吉-伊尔马兹公式分析了TOA ROP指南和LLM生成的回答的可读性。此外,ROP专家评估了回答的全面性和准确性。
发现TOA手册的阅读水平高于文献中推荐的六年级水平。GPT-4.0和Gemini生成的材料的可读性明显高于TOA手册(p<0.05)。以P1和P2格式进行的调整提高了GPT-4.0的可读性,而GPT-4o mini和Gemini则未观察到显著变化。GPT-4.0在准确性和全面性方面得分最高,而Gemini得分最低。
GPT-4.0在生成更具可读性、准确性和全面性的患者教育材料方面似乎具有更大潜力。然而,在将LLM整合到医疗保健领域时,必须仔细评估地区医疗差异和所提供信息的准确性。