评估大语言模型在根据阅读水平生成皮肤科患者教育材料方面的应用：定性研究。

Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.

机构信息

Pritzker School of Medicine, University of Chicago, Chicago, IL, United States.

Section of Dermatology, University of Chicago Medical Center, Chicago, IL, United States.

出版信息

JMIR Dermatol. 2024 May 16;7:e55898. doi: 10.2196/55898.

DOI:10.2196/55898

PMID:38754096

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11140271/

Abstract

BACKGROUND

Dermatologic patient education materials (PEMs) are often written above the national average seventh- to eighth-grade reading level. ChatGPT-3.5, GPT-4, DermGPT, and DocsGPT are large language models (LLMs) that are responsive to user prompts. Our project assesses their use in generating dermatologic PEMs at specified reading levels.

OBJECTIVE

This study aims to assess the ability of select LLMs to generate PEMs for common and rare dermatologic conditions at unspecified and specified reading levels. Further, the study aims to assess the preservation of meaning across such LLM-generated PEMs, as assessed by dermatology resident trainees.

METHODS

The Flesch-Kincaid reading level (FKRL) of current American Academy of Dermatology PEMs was evaluated for 4 common (atopic dermatitis, acne vulgaris, psoriasis, and herpes zoster) and 4 rare (epidermolysis bullosa, bullous pemphigoid, lamellar ichthyosis, and lichen planus) dermatologic conditions. We prompted ChatGPT-3.5, GPT-4, DermGPT, and DocsGPT to "Create a patient education handout about [condition] at a [FKRL]" to iteratively generate 10 PEMs per condition at unspecified fifth- and seventh-grade FKRLs, evaluated with Microsoft Word readability statistics. The preservation of meaning across LLMs was assessed by 2 dermatology resident trainees.

RESULTS

The current American Academy of Dermatology PEMs had an average (SD) FKRL of 9.35 (1.26) and 9.50 (2.3) for common and rare diseases, respectively. For common diseases, the FKRLs of LLM-produced PEMs ranged between 9.8 and 11.21 (unspecified prompt), between 4.22 and 7.43 (fifth-grade prompt), and between 5.98 and 7.28 (seventh-grade prompt). For rare diseases, the FKRLs of LLM-produced PEMs ranged between 9.85 and 11.45 (unspecified prompt), between 4.22 and 7.43 (fifth-grade prompt), and between 5.98 and 7.28 (seventh-grade prompt). At the fifth-grade reading level, GPT-4 was better at producing PEMs for both common and rare conditions than ChatGPT-3.5 (P=.001 and P=.01, respectively), DermGPT (P<.001 and P=.03, respectively), and DocsGPT (P<.001 and P=.02, respectively). At the seventh-grade reading level, no significant difference was found between ChatGPT-3.5, GPT-4, DocsGPT, or DermGPT in producing PEMs for common conditions (all P>.05); however, for rare conditions, ChatGPT-3.5 and DocsGPT outperformed GPT-4 (P=.003 and P<.001, respectively). The preservation of meaning analysis revealed that for common conditions, DermGPT ranked the highest for overall ease of reading, patient understandability, and accuracy (14.75/15, 98%); for rare conditions, handouts generated by GPT-4 ranked the highest (14.5/15, 97%).

CONCLUSIONS

GPT-4 appeared to outperform ChatGPT-3.5, DocsGPT, and DermGPT at the fifth-grade FKRL for both common and rare conditions, although both ChatGPT-3.5 and DocsGPT performed better than GPT-4 at the seventh-grade FKRL for rare conditions. LLM-produced PEMs may reliably meet seventh-grade FKRLs for select common and rare dermatologic conditions and are easy to read, understandable for patients, and mostly accurate. LLMs may play a role in enhancing health literacy and disseminating accessible, understandable PEMs in dermatology.

摘要

背景

皮肤科患者教育材料（PEM）的编写水平通常高于全国平均水平的七至八年级阅读水平。ChatGPT-3.5、GPT-4、DermGPT 和 DocsGPT 是响应用户提示的大型语言模型（LLM）。我们的项目评估了它们在指定阅读水平下生成常见和罕见皮肤病 PEM 的能力。

目的

本研究旨在评估选定的 LLM 在未指定和指定阅读水平下生成常见和罕见皮肤病 PEM 的能力。此外，研究旨在评估皮肤科住院医师受训者评估的此类 LLM 生成的 PEM 中意义的保留情况。

方法

评估了当前美国皮肤病学会 PEM 中 4 种常见（特应性皮炎、寻常痤疮、银屑病和带状疱疹）和 4 种罕见（大疱性表皮松解症、天疱疮、层状鱼鳞癣和扁平苔藓）皮肤病的 Flesch-Kincaid 阅读水平（FKRL）。我们提示 ChatGPT-3.5、GPT-4、DermGPT 和 DocsGPT“创建一个关于[疾病]的患者教育手册，阅读水平为[FKRL]”，以迭代方式为每个疾病生成 10 个未指定第五级和第七级 FKRL 的 PEM，使用 Microsoft Word 可读性统计进行评估。通过 2 名皮肤科住院医师受训者评估 LLM 之间意义的保留情况。

结果

当前美国皮肤病学会 PEM 的平均（SD）FKRL 分别为 9.35（1.26）和 9.50（2.3），用于常见和罕见疾病。对于常见疾病，LLM 生成的 PEM 的 FKRL 范围在 9.8 到 11.21（未指定提示）、4.22 到 7.43（五年级提示）和 5.98 到 7.28（七年级提示）之间。对于罕见疾病，LLM 生成的 PEM 的 FKRL 范围在 9.85 到 11.45（未指定提示）、4.22 到 7.43（五年级提示）和 5.98 到 7.28（七年级提示）之间。在五年级阅读水平下，GPT-4 在生成常见和罕见疾病的 PEM 方面均优于 ChatGPT-3.5（P=.001 和 P=.01）、DermGPT（P<.001 和 P=.03）和 DocsGPT（P<.001 和 P=.02）。在七年级阅读水平下，ChatGPT-3.5、GPT-4、DocsGPT 或 DermGPT 之间在生成常见疾病的 PEM 方面没有显著差异（均 P>.05）；然而，对于罕见疾病，ChatGPT-3.5 和 DocsGPT 的表现优于 GPT-4（P=.003 和 P<.001）。意义保留分析显示，对于常见疾病，总体阅读舒适度、患者理解度和准确性方面，DermGPT 排名最高（14.75/15，98%）；对于罕见疾病，GPT-4 生成的 PEM 排名最高（14.5/15，97%）。

结论

在常见和罕见疾病的五年级 FKRL 方面，GPT-4 似乎优于 ChatGPT-3.5、DocsGPT 和 DermGPT，尽管 ChatGPT-3.5 和 DocsGPT 在罕见疾病的七年级 FKRL 方面的表现均优于 GPT-4。LLM 生成的 PEM 可能可靠地满足常见和罕见皮肤病的第七级 FKRL 要求，并且易于阅读、患者易于理解，并且大部分内容准确。LLM 可以在皮肤科中发挥作用，提高健康素养并传播易懂的 PEM。

相似文献

Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.

JMIR Dermatol. 2024 May 16;7:e55898. doi: 10.2196/55898.

Large language models: a new frontier in paediatric cataract patient education.

Br J Ophthalmol. 2024 Sep 20;108(10):1470-1476. doi: 10.1136/bjo-2024-325252.

Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources.

Surg Endosc. 2024 May;38(5):2522-2532. doi: 10.1007/s00464-024-10720-2. Epub 2024 Mar 12.

Using Large Language Models to Generate Educational Materials on Childhood Glaucoma.

Am J Ophthalmol. 2024 Sep;265:28-38. doi: 10.1016/j.ajo.2024.04.004. Epub 2024 Apr 16.

Advancing Patient Education in Idiopathic Intracranial Hypertension: The Promise of Large Language Models.

Neurol Clin Pract. 2025 Feb;15(1):e200366. doi: 10.1212/CPJ.0000000000200366. Epub 2024 Oct 8.

Enhancing Health Literacy: Evaluating the Readability of Patient Handouts Revised by ChatGPT's Large Language Model.

Otolaryngol Head Neck Surg. 2024 Dec;171(6):1751-1757. doi: 10.1002/ohn.927. Epub 2024 Aug 6.

Can Artificial Intelligence Improve the Readability of Patient Education Materials on Aortic Stenosis? A Pilot Study.

Cardiol Ther. 2024 Mar;13(1):137-147. doi: 10.1007/s40119-023-00347-0. Epub 2024 Jan 9.

Artificial Intelligence-Generated Patient Education Materials for Helicobacter pylori Infection: A Comparative Analysis.

Helicobacter. 2024 Jul-Aug;29(4):e13115. doi: 10.1111/hel.13115.

Evaluation of Generative Language Models in Personalizing Medical Information: Instrument Validation Study.

JMIR AI. 2024 Aug 13;3:e54371. doi: 10.2196/54371.

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.

JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.

引用本文的文献

Does ChatGPT help patients access reliable and comprehensive information about psoriasis?

Proc (Bayl Univ Med Cent). 2025 Jun 20;38(5):658-661. doi: 10.1080/08998280.2025.2518854. eCollection 2025.

Using large language models to generate child-friendly education materials on myopia.

Digit Health. 2025 Jul 30;11:20552076251362338. doi: 10.1177/20552076251362338. eCollection 2025 Jan-Dec.

Development and evaluation of an agentic LLM based RAG framework for evidence-based patient education.

BMJ Health Care Inform. 2025 Jul 25;32(1):e101570. doi: 10.1136/bmjhci-2025-101570.

Co-Design of a Health Screening Program Fact Sheet by People Experiencing Homelessness and ChatGPT: Focus Group Study.

JMIR Form Res. 2025 Jul 4;9:e68316. doi: 10.2196/68316.

The Role of ChatGPT in Dermatology Diagnostics.

Diagnostics (Basel). 2025 Jun 16;15(12):1529. doi: 10.3390/diagnostics15121529.

Enhancing patient-centered information on implant dentistry through prompt engineering: a comparison of four large language models.

Front Oral Health. 2025 Apr 7;6:1566221. doi: 10.3389/froh.2025.1566221. eCollection 2025.

Risks and benefits of ChatGPT in informing patients and families with rare kidney diseases: an explorative assessment by the European Rare Kidney Disease Reference Network (ERKNet).

Pediatr Nephrol. 2025 Apr 16. doi: 10.1007/s00467-025-06746-w.

People Living with Chronic Pain Experience a High Prevalence of Decision Regret in Canada: A Pan-Canadian Online Survey.

Med Decis Making. 2025 May;45(4):462-479. doi: 10.1177/0272989X251326069. Epub 2025 Mar 22.

The use of large language models to enhance cancer clinical trial educational materials.

JNCI Cancer Spectr. 2025 Mar 3;9(2). doi: 10.1093/jncics/pkaf021.

The Ability of Large Language Models to Generate Patient Information Materials for Retinopathy of Prematurity: Evaluation of Readability, Accuracy, and Comprehensiveness.

Turk J Ophthalmol. 2024 Dec 31;54(6):330-336. doi: 10.4274/tjo.galenos.2024.58295.

本文引用的文献

Almanac - Retrieval-Augmented Language Models for Clinical Medicine.

NEJM AI. 2024 Feb;1(2). doi: 10.1056/aioa2300068. Epub 2024 Jan 25.

Evaluation of ChatGPT Dermatology Responses to Common Patient Queries.

JMIR Dermatol. 2023 Nov 17;6:e49280. doi: 10.2196/49280.

Assessing the Accuracy and Comprehensiveness of ChatGPT in Offering Clinical Guidance for Atopic Dermatitis and Acne Vulgaris.

JMIR Dermatol. 2023 Nov 14;6:e50409. doi: 10.2196/50409.

ChatGPT for healthcare providers and patients: Practical implications within dermatology.

J Am Acad Dermatol. 2023 Oct;89(4):870-871. doi: 10.1016/j.jaad.2023.05.081. Epub 2023 Jun 12.

Prompt Engineering with ChatGPT: A Guide for Academic Writers.

Ann Biomed Eng. 2023 Dec;51(12):2629-2633. doi: 10.1007/s10439-023-03272-4. Epub 2023 Jun 7.

Consulting ChatGPT: Ethical dilemmas in language model artificial intelligence.

J Am Acad Dermatol. 2024 Apr;90(4):879-880. doi: 10.1016/j.jaad.2023.02.052. Epub 2023 Mar 11.

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.

JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

Online Health Information for Acne Keloidalis Nuchae has a Difficult Level of Readability.

J Drugs Dermatol. 2023 Feb 1;22(2):195-196. doi: 10.36849/JDD.7110.

An assessment of patient education resources for pemphigus vulgaris and bullous pemphigoid.

Int J Dermatol. 2023 Jul;62(7):e407-e409. doi: 10.1111/ijd.16458. Epub 2022 Oct 17.

Readability of online Spanish patient education materials in dermatology.

Arch Dermatol Res. 2021 Apr;313(3):201-204. doi: 10.1007/s00403-020-02036-7. Epub 2020 Feb 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

评估大语言模型在根据阅读水平生成皮肤科患者教育材料方面的应用：定性研究。

Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr超能文献

评估大语言模型在根据阅读水平生成皮肤科患者教育材料方面的应用：定性研究。

Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr
超能文献