目前可用的大语言模型在提高儿科骨科英语和西班牙语患者教育材料的可读性方面有一定效果。

Currently Available Large Language Models Are Moderately Effective in Improving Readability of English and Spanish Patient Education Materials in Pediatric Orthopaedics.

作者信息

Nian Patrick P, Williams Christopher J, Senthilnathan Ithika S, Marsh Isabella G, Jones Ruth H, Palandjian Pari L, Heyer Jessica H, Doyle Shevaun M

机构信息

From the Hospital for Special Surgery (Nian, Williams, Senthilnathan, Marsh, Jones, Palandjian, Heyer, Doyle), New York City, NY.

出版信息

J Am Acad Orthop Surg. 2025 Jun 24. doi: 10.5435/JAAOS-D-25-00267.

DOI:10.5435/JAAOS-D-25-00267

PMID:40560869

Abstract

INTRODUCTION

Patient education materials (PEMs) consistently exceed the recommended sixth-grade reading level. Poor health literacy and limited English proficiency, particularly in more than 40 million Spanish speakers, is associated with adverse patient outcomes. The use case of artificial intelligence (AI) to improve readability has rarely been validated in Spanish PEMs or in pediatric orthopaedic PEMs. This study aimed to (1) assess the availability and readability of English and Spanish pediatric orthopaedic PEMs and (2) compare the efficacy of ChatGPT-4.0 and Google Gemini to improve readability.

METHODS

Pediatric orthopaedic PEMs were collected from 13 websites of pediatric orthopaedic hospitals and societies. Grade levels were assessed using the Flesch-Kincaid Grade-Level (FKGL) and Gunning Fog Index (GFI) for English articles and FKGL and Spanish Simple Measure of Gobbledygook (SMOG) for Spanish articles. English and Spanish PEMs were additionally assessed using Flesch Reading Ease (FRE) and Fernandez-Huerta Index (FHI), respectively. ChatGPT-4.0 and Google Gemini were prompted to rewrite article text at a sixth-grade level. AI-converted readability was compared categorically by proportion of articles ≤sixth-grade level and continuously through all metrics.

RESULTS

Of 103 English articles, 40 (38.8%) were available in Spanish. Baseline readability ≤sixth FKGL was low for English (5.8%) and Spanish (10.0%) articles. 21.4% and 60.2% of ChatGPT-4.0-converted and Google Gemini-converted English PEMs achieved ≤sixth FKGL, respectively. 52.5% and 77.5% of ChatGPT-4.0-converted and Google Gemini-converted Spanish PEMs achieved ≤sixth FKGL, respectively. Google Gemini had greater absolute improvements in GFI, English FKGL, and Spanish SMOG, and a higher proportion of articles ≤ sixth-grade level (GFI, FKGL, Spanish SMOG) compared with ChatGPT-4.0 (all, P < 0.05).

CONCLUSIONS

Pediatric orthopaedic PEMs are limited by complex readability and low availability of Spanish PEMs. Medical societies/hospitals may use AI models, particularly Google Gemini, to improve readability and patient comprehension, but increasing accessibility to Spanish PEMs is also necessary.

摘要

引言

患者教育材料（PEMs）一直超过推荐的六年级阅读水平。健康素养差和英语水平有限，尤其是在超过4000万讲西班牙语的人群中，与不良的患者预后相关。人工智能（AI）用于提高可读性的用例在西班牙语PEMs或儿科骨科PEMs中很少得到验证。本研究旨在：（1）评估英语和西班牙语儿科骨科PEMs的可用性和可读性；（2）比较ChatGPT-4.0和谷歌Gemini在提高可读性方面的效果。

方法

从13个儿科骨科医院和协会的网站收集儿科骨科PEMs。使用弗莱什-金凯德年级水平（FKGL）和冈宁雾度指数（GFI）评估英语文章的年级水平，使用FKGL和西班牙语胡言乱语简易测量法（SMOG）评估西班牙语文章的年级水平。此外，分别使用弗莱什阅读简易度（FRE）和费尔南德斯-韦尔塔指数（FHI）评估英语和西班牙语PEMs。要求ChatGPT-4.0和谷歌Gemini将文章文本改写为六年级水平。通过≤六年级水平文章的比例进行分类比较，并通过所有指标进行连续比较AI转换后的可读性。

结果

在103篇英语文章中，40篇（38.8%）有西班牙语版本。英语（5.8%）和西班牙语（10.0%）文章的基线可读性≤六年级FKGL较低。ChatGPT-4.0转换和谷歌Gemini转换的英语PEMs分别有21.4%和60.2%达到≤六年级FKGL。ChatGPT-4.0转换和谷歌Gemini转换的西班牙语PEMs分别有52.5%和77.5%达到≤六年级FKGL。与ChatGPT-4.0相比，谷歌Gemini在GFI、英语FKGL和西班牙语SMOG方面的绝对改善更大，且≤六年级水平的文章比例更高（GFI、FKGL、西班牙语SMOG，均P<0.05）。