Romoff Melissa, Brunette Madison, Peterson Melanie K, Hashmi Sohaib Z, Kim Michael S
Department of Orthopaedic Surgery, University of California, Irvine, School of Medicine, 101 The City Dr S, Pavilion 3, Building 29 A, Orange, CA, 92868, USA.
J Orthop Surg Res. 2025 May 28;20(1):531. doi: 10.1186/s13018-025-05955-1.
Patient education is crucial for informed decision-making. Current educational materials are often written at a higher grade level than the American Medical Association (AMA)-recommended sixth-grade level. Few studies have assessed the readability of orthopaedic materials such as American Academy of Orthopaedic Surgeons (AAOS) OrthoInfo articles, and no studies have suggested efficient methods to improve readability. This study assessed the readability of OrthoInfo spine articles and investigated the ability of large language models (LLMs) to improve readability.
A cross-sectional study analyzed 19 OrthoInfo articles using validated readability metrics (Flesch-Kincaid Grade Level and Reading Ease). Articles were simplified iteratively in three steps using ChatGPT, Gemini, and CoPilot. LLMs were prompted to summarize text, followed by two clarification prompts simulating patient inquiries. Word count, readability, and accuracy were assessed at each step. Accuracy was rated by two independent reviewers using a three-point scale (3 = fully accurate, 2 = minor inaccuracies, 1 = major inaccuracies). Statistical analysis included one-way and two-way ANOVA, followed by Tukey post-hoc tests for pairwise comparisons.
Baseline readability exceeded AMA recommendations, with a mean Flesch-Kincaid Grade Level of 9.5 and a Reading Ease score of 51.1. LLM summaries provided statistically significant improvement in readability, with the greatest improvements in the first iteration. All three LLMs performed similarly, though ChatGPT achieved statistically significant improvements in Reading Ease scores. Gemini incorporated appropriate disclaimers most consistently. Accuracy remained stable throughout, with no evidence of hallucination or compromise in content quality or medical relevance.
LLMs effectively simplify orthopaedic educational content by reducing grade levels, enhancing readability, and maintaining acceptable accuracy. Readability improvements were most significant in initial simplification steps, with all models performing consistently. These findings support the integration of LLMs into patient education workflows, offering a scalable strategy to improve health literacy, enhance patient comprehension, and promote more equitable access to medical information across diverse populations.
患者教育对于做出明智的决策至关重要。当前的教育材料通常以高于美国医学协会(AMA)推荐的六年级水平编写。很少有研究评估过诸如美国矫形外科医师学会(AAOS)的OrthoInfo文章等骨科材料的可读性,并且没有研究提出提高可读性的有效方法。本研究评估了OrthoInfo脊柱文章的可读性,并研究了大语言模型(LLMs)提高可读性的能力。
一项横断面研究使用经过验证的可读性指标(弗莱施-金凯德年级水平和阅读简易度)分析了19篇OrthoInfo文章。使用ChatGPT、Gemini和CoPilot分三步对文章进行迭代简化。提示大语言模型总结文本,随后是两个模拟患者询问的澄清提示。在每个步骤中评估单词数、可读性和准确性。准确性由两名独立的评审员使用三点量表进行评分(3 = 完全准确,2 = 轻微不准确,1 = 严重不准确)。统计分析包括单向和双向方差分析,随后进行Tukey事后检验以进行成对比较。
基线可读性超过了AMA的建议,平均弗莱施-金凯德年级水平为9.5,阅读简易度得分为51.1。大语言模型的总结在可读性方面提供了具有统计学意义的改善,在第一次迭代中改善最为显著。所有三个大语言模型的表现相似,尽管ChatGPT在阅读简易度得分方面实现了具有统计学意义的改善。Gemini最始终如一地纳入了适当的免责声明。准确性在整个过程中保持稳定,没有出现幻觉或内容质量和医学相关性受损的迹象。
大语言模型通过降低年级水平、提高可读性和保持可接受的准确性有效地简化了骨科教育内容。在最初的简化步骤中,可读性的改善最为显著,所有模型的表现都很一致。这些发现支持将大语言模型整合到患者教育工作流程中,提供了一种可扩展的策略来提高健康素养、增强患者理解并促进不同人群更公平地获取医疗信息。