大语言模型生成早产儿视网膜病变患者信息材料的能力：可读性、准确性和全面性评估

The Ability of Large Language Models to Generate Patient Information Materials for Retinopathy of Prematurity: Evaluation of Readability, Accuracy, and Comprehensiveness.

作者信息

Postacı Sevinç Arzu, Dal Ali

机构信息

Mustafa Kemal University, Tayfur Sökmen Faculty of Medicine, Department of Ophthalmology, Hatay, Türkiye.

出版信息

Turk J Ophthalmol. 2024 Dec 31;54(6):330-336. doi: 10.4274/tjo.galenos.2024.58295.

DOI:10.4274/tjo.galenos.2024.58295

PMID:39743928

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11707455/

Abstract

OBJECTIVES

This study compared the readability of patient education materials from the Turkish Ophthalmological Association (TOA) retinopathy of prematurity (ROP) guidelines with those generated by large language models (LLMs). The ability of GPT-4.0, GPT-4o mini, and Gemini to produce patient education materials was evaluated in terms of accuracy and comprehensiveness.

MATERIALS AND METHODS

Thirty questions from the TOA ROP guidelines were posed to GPT-4.0, GPT-4o mini, and Gemini. Their responses were then reformulated using the prompts "Can you revise this text to be understandable at a 6-grade reading level?" (P1 format) and "Can you make this text easier to understand?" (P2 format). The readability of the TOA ROP guidelines and the LLM-generated responses was analyzed using the Ateşman and Bezirci-Yılmaz formulas. Additionally, ROP specialists evaluated the comprehensiveness and accuracy of the responses.

RESULTS

The TOA brochure was found to have a reading level above the 6-grade level recommended in the literature. Materials generated by GPT-4.0 and Gemini had significantly greater readability than the TOA brochure (p<0.05). Adjustments made in the P1 and P2 formats improved readability for GPT-4.0, while no significant change was observed for GPT-4o mini and Gemini. GPT-4.0 had the highest scores for accuracy and comprehensiveness, while Gemini had the lowest.

CONCLUSION

GPT-4.0 appeared to have greater potential for generating more readable, accurate, and comprehensive patient education materials. However, when integrating LLMs into the healthcare field, regional medical differences and the accuracy of the provided information must be carefully assessed.

摘要

目的

本研究比较了土耳其眼科学会（TOA）早产儿视网膜病变（ROP）指南中患者教育材料与大语言模型（LLM）生成的材料的可读性。从准确性和全面性方面评估了GPT-4.0、GPT-4o mini和Gemini生成患者教育材料的能力。

材料与方法

向GPT-4.0、GPT-4o mini和Gemini提出了30个来自TOA ROP指南的问题。然后使用提示语“你能将这段文本修改为六年级阅读水平可理解的内容吗？”（P1格式）和“你能让这段文本更容易理解吗？”（P2格式）对它们的回答进行重新表述。使用阿泰斯曼公式和贝齐尔吉-伊尔马兹公式分析了TOA ROP指南和LLM生成的回答的可读性。此外，ROP专家评估了回答的全面性和准确性。

结果

发现TOA手册的阅读水平高于文献中推荐的六年级水平。GPT-4.0和Gemini生成的材料的可读性明显高于TOA手册（p<0.05）。以P1和P2格式进行的调整提高了GPT-4.0的可读性，而GPT-4o mini和Gemini则未观察到显著变化。GPT-4.0在准确性和全面性方面得分最高，而Gemini得分最低。

结论

GPT-4.0在生成更具可读性、准确性和全面性的患者教育材料方面似乎具有更大潜力。然而，在将LLM整合到医疗保健领域时，必须仔细评估地区医疗差异和所提供信息的准确性。

相似文献

The Ability of Large Language Models to Generate Patient Information Materials for Retinopathy of Prematurity: Evaluation of Readability, Accuracy, and Comprehensiveness.大语言模型生成早产儿视网膜病变患者信息材料的能力：可读性、准确性和全面性评估

Turk J Ophthalmol. 2024 Dec 31;54(6):330-336. doi: 10.4274/tjo.galenos.2024.58295.

Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study.使用大语言模型提高在线患者教育材料的可读性：横断面研究。

J Med Internet Res. 2025 Jun 4;27:e69955. doi: 10.2196/69955.

Improving the Readability of Institutional Heart Failure-Related Patient Education Materials Using GPT-4: Observational Study.使用GPT-4提高机构性心力衰竭相关患者教育材料的可读性：观察性研究

JMIR Cardio. 2025 Jul 8;9:e68817. doi: 10.2196/68817.

Can artificial intelligence improve the readability of patient education information in gynecology?人工智能能否提高妇科患者教育信息的可读性？

Am J Obstet Gynecol. 2025 Jun 25. doi: 10.1016/j.ajog.2025.06.047.

Readability, Reliability, and Quality Analysis of Internet-Based Patient Education Materials and Large Language Models on Meniere's Disease.基于互联网的梅尼埃病患者教育材料和大语言模型的可读性、可靠性及质量分析

J Otolaryngol Head Neck Surg. 2025 Jan-Dec;54:19160216251360651. doi: 10.1177/19160216251360651. Epub 2025 Aug 8.

Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis.大型语言模型在为癌症幸存者及其护理人员量身定制教育内容方面的评估：质量分析

JMIR Cancer. 2025 Apr 7;11:e67914. doi: 10.2196/67914.

Enhancing Magnetic Resonance Imaging (MRI) Report Comprehension in Spinal Trauma: Readability Analysis of AI-Generated Explanations for Thoracolumbar Fractures.提高脊柱创伤磁共振成像（MRI）报告的理解：胸腰椎骨折人工智能生成解释的可读性分析

JMIR AI. 2025 Jul 1;4:e69654. doi: 10.2196/69654.

Readability of patient education materials in ophthalmology: a single-institution study and systematic review.眼科患者教育材料的可读性：一项单机构研究及系统评价

BMC Ophthalmol. 2016 Aug 3;16:133. doi: 10.1186/s12886-016-0315-0.

Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.评估大语言模型在肩胛下肌上囊重建术前患者教育中的应用：Claude、GPT和Gemini的比较研究

JMIR Perioper Med. 2025 Jun 12;8:e70047. doi: 10.2196/70047.

Artificial intelligence-simplified information to advance reproductive genetic literacy and health equity.人工智能简化信息以促进生殖遗传知识普及和健康公平。

Hum Reprod. 2025 Jul 22. doi: 10.1093/humrep/deaf135.

引用本文的文献

Performance of ChatGPT-4 Omni and Gemini 1.5 Pro on Ophthalmology-Related Questions in the Turkish Medical Specialty Exam.ChatGPT-4 Omni和Gemini 1.5 Pro在土耳其医学专业考试中与眼科相关问题上的表现。

Turk J Ophthalmol. 2025 Aug 21;55(4):177-185. doi: 10.4274/tjo.galenos.2025.27895.

本文引用的文献

The transformative impact of large language models on medical writing and publishing: current applications, challenges and future directions.大语言模型对医学写作与出版的变革性影响：当前应用、挑战及未来方向

Korean J Physiol Pharmacol. 2024 Sep 1;28(5):393-401. doi: 10.4196/kjpp.2024.28.5.393.

Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.评估大语言模型在根据阅读水平生成皮肤科患者教育材料方面的应用：定性研究。

JMIR Dermatol. 2024 May 16;7:e55898. doi: 10.2196/55898.

Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources.大型语言模型和减重手术患者教育：GPT-3.5、GPT-4、Bard 与在线机构资源的可读性比较分析。

Surg Endosc. 2024 May;38(5):2522-2532. doi: 10.1007/s00464-024-10720-2. Epub 2024 Mar 12.

Can Artificial Intelligence Improve the Readability of Patient Education Materials on Aortic Stenosis? A Pilot Study.人工智能能否提高主动脉瓣狭窄患者教育材料的可读性？一项试点研究。

Cardiol Ther. 2024 Mar;13(1):137-147. doi: 10.1007/s40119-023-00347-0. Epub 2024 Jan 9.

Evaluating the Performance of Different Large Language Models on Health Consultation and Patient Education in Urolithiasis.评估不同大型语言模型在泌尿系结石健康咨询和患者教育中的表现。

J Med Syst. 2023 Nov 24;47(1):125. doi: 10.1007/s10916-023-02021-3.

iScience. 2023 Oct 10;26(11):108163. doi: 10.1016/j.isci.2023.108163. eCollection 2023 Nov 17.

Accuracy and Reliability of Chatbot Responses to Physician Questions.聊天机器人对医生提问回答的准确性和可靠性。

JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483.

Leveraging ChatGPT in the Pediatric Neurology Clinic: Practical Considerations for Use to Improve Efficiency and Outcomes.在儿科神经科诊所利用ChatGPT：提高效率和改善治疗效果的实用考量

Pediatr Neurol. 2023 Nov;148:157-163. doi: 10.1016/j.pediatrneurol.2023.08.035. Epub 2023 Aug 29.

The Use of Large Language Models to Generate Education Materials about Uveitis.使用大型语言模型生成有关葡萄膜炎的教育材料。

Ophthalmol Retina. 2024 Feb;8(2):195-201. doi: 10.1016/j.oret.2023.09.008. Epub 2023 Sep 15.

ChatGPT in Radiology: The Advantages and Limitations of Artificial Intelligence for Medical Imaging Diagnosis.放射学中的ChatGPT：人工智能在医学影像诊断中的优势与局限

Cureus. 2023 Jul 6;15(7):e41435. doi: 10.7759/cureus.41435. eCollection 2023 Jul.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验