Suppr超能文献

大语言模型:小儿白内障患者教育的新前沿。

Large language models: a new frontier in paediatric cataract patient education.

机构信息

Rosalind Franklin University of Medicine and Science Chicago Medical School, North Chicago, Illinois, USA.

Deparment of Ophthalmology, University of Arkansas for Medical Sciences, Little Rock, AR, USA.

出版信息

Br J Ophthalmol. 2024 Sep 20;108(10):1470-1476. doi: 10.1136/bjo-2024-325252.

Abstract

BACKGROUND/AIMS: This was a cross-sectional comparative study. We evaluated the ability of three large language models (LLMs) (ChatGPT-3.5, ChatGPT-4, and Google Bard) to generate novel patient education materials (PEMs) and improve the readability of existing PEMs on paediatric cataract.

METHODS

We compared LLMs' responses to three prompts. Prompt A requested they write a handout on paediatric cataract that was 'easily understandable by an average American.' Prompt B modified prompt A and requested the handout be written at a 'sixth-grade reading level, using the Simple Measure of Gobbledygook (SMOG) readability formula.' Prompt C rewrote existing PEMs on paediatric cataract 'to a sixth-grade reading level using the SMOG readability formula'. Responses were compared on their quality (DISCERN; 1 (low quality) to 5 (high quality)), understandability and actionability (Patient Education Materials Assessment Tool (≥70%: understandable, ≥70%: actionable)), accuracy (Likert misinformation; 1 (no misinformation) to 5 (high misinformation) and readability (SMOG, Flesch-Kincaid Grade Level (FKGL); grade level <7: highly readable).

RESULTS

All LLM-generated responses were of high-quality (median DISCERN ≥4), understandability (≥70%), and accuracy (Likert=1). All LLM-generated responses were not actionable (<70%). ChatGPT-3.5 and ChatGPT-4 prompt B responses were more readable than prompt A responses (p<0.001). ChatGPT-4 generated more readable responses (lower SMOG and FKGL scores; 5.59±0.5 and 4.31±0.7, respectively) than the other two LLMs (p<0.001) and consistently rewrote them to or below the specified sixth-grade reading level (SMOG: 5.14±0.3).

CONCLUSION

LLMs, particularly ChatGPT-4, proved valuable in generating high-quality, readable, accurate PEMs and in improving the readability of existing materials on paediatric cataract.

摘要

背景/目的:这是一项横断面比较研究。我们评估了三个大型语言模型(LLM)(ChatGPT-3.5、ChatGPT-4 和 Google Bard)生成新颖的患者教育材料(PEM)的能力,并改善了儿科白内障现有 PEM 的可读性。

方法

我们比较了 LLM 对三个提示的响应。提示 A 要求他们撰写一份关于儿科白内障的讲义,该讲义“易于普通美国人理解”。提示 B 修改了提示 A,并要求讲义以“六年级阅读水平,使用简单测词法(SMOG)可读性公式”编写。提示 C 用 SMOG 可读性公式将现有的儿科白内障 PEM“改写为六年级阅读水平”。在质量(DISCERN;1(低质量)至 5(高质量))、可理解性和可操作性(患者教育材料评估工具(≥70%:可理解,≥70%:可操作))、准确性(Likert 错误信息;1(无错误信息)至 5(高错误信息)和可读性(SMOG,Flesch-Kincaid 年级水平(FKGL);年级水平<7:高度可读)方面比较响应。

结果

所有 LLM 生成的响应质量均较高(中位数 DISCERN≥4),理解性(≥70%)和准确性(Likert=1)。所有 LLM 生成的响应均不可操作(<70%)。ChatGPT-3.5 和 ChatGPT-4 提示 B 的响应比提示 A 的响应更具可读性(p<0.001)。ChatGPT-4 生成的响应更具可读性(较低的 SMOG 和 FKGL 分数;分别为 5.59±0.5 和 4.31±0.7),而其他两个 LLM 则更具可读性(p<0.001),并始终将其改写为或低于指定的六年级阅读水平(SMOG:5.14±0.3)。

结论

LLM,特别是 ChatGPT-4,在生成高质量、可读、准确的 PEM 以及改善儿科白内障现有材料的可读性方面具有很高的价值。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验