Picton Bryce, Andalib Saman, Spina Aidin, Camp Brandon, Solomon Sean S, Liang Jason, Chen Patrick M, Chen Jefferson W, Hsu Frank P, Oh Michael Y
Department of Neurological Surgery, University of California, Irvine, Orange, CA, USA.
School of Medicine, University of California, Irvine, Orange, CA, USA.
Int J Med Inform. 2025 Mar;195:105743. doi: 10.1016/j.ijmedinf.2024.105743. Epub 2024 Dec 1.
INTRODUCTION: The escalating complexity of medical literature necessitates tools to enhance readability for patients. This study aimed to evaluate the efficacy of ChatGPT-4 in simplifying neurology and neurosurgical abstracts and patient education materials (PEMs) while assessing content preservation using Latent Semantic Analysis (LSA). METHODS: A total of 100 abstracts (25 each from Neurosurgery, Journal of Neurosurgery, Lancet Neurology, and JAMA Neurology) and 340 PEMs (66 from the American Association of Neurological Surgeons, 274 from the American Academy of Neurology) were transformed by a GPT-4.0 prompt requesting a 5th grade reading level. Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FKRE) scores were used before/after transformation. Content fidelity was validated via LSA (ranging 0-1, 1 meaning identical topics) and by expert assessment (0-1) for a subset (n = 40). Pearson correlation coefficient compared assessments. RESULTS: FKGL decreased from 12th to 5th grade for abstracts and 13th to 5th for PEMs (p < 0.001). FKRE scores showed similar improvement (p < 0.001). LSA confirmed high content similarity for abstracts (mean cosine similarity 0.746) and PEMs (mean 0.953). Expert assessment indicated a mean topic similarity of 0.775 for abstracts and 0.715 for PEMs. The Pearson coefficient between LSA and expert assessment of textual similarity was 0.598 for abstracts and -0.167 for PEMs. Segmented analysis of similarity correlations revealed a correlation of 0.48 (p = 0.02) below 450 words and a -0.20 (p = 0.43) correlation above 450 words. CONCLUSION: GPT-4.0 markedly improved the readability of medical texts, predominantly maintaining content integrity as substantiated by LSA and expert evaluations. LSA emerged as a reliable tool for assessing content fidelity within moderate-length texts, but its utility diminished for longer documents, overestimating similarity. These findings support the potential of AI in combating low health literacy, however, the similarity scores indicate expert validation is crucial. Future research must strive to improve transformation precision and develop validation methodologies.
引言:医学文献的复杂性不断升级,因此需要一些工具来提高患者对其的可读性。本研究旨在评估ChatGPT-4在简化神经病学和神经外科摘要以及患者教育材料(PEM)方面的效果,同时使用潜在语义分析(LSA)评估内容保留情况。 方法:总共100篇摘要(神经外科、《神经外科杂志》《柳叶刀神经病学》和《美国医学会神经病学杂志》各25篇)和340份PEM(美国神经外科医师协会的66份,美国神经病学学会的274份)通过GPT-4.0提示进行转换,要求达到五年级阅读水平。在转换前后使用弗莱什-金凯德年级水平(FKGL)和弗莱什阅读简易度(FKRE)分数。通过LSA(范围为0至1,1表示主题相同)和对一个子集(n = 40)的专家评估(0至1)来验证内容保真度。使用皮尔逊相关系数比较评估结果。 结果:摘要的FKGL从12年级降至5年级,PEM的从13年级降至5年级(p < 0.001)。FKRE分数显示出类似的改善(p < 0.001)。LSA证实摘要(平均余弦相似度0.746)和PEM(平均0.953)具有高度的内容相似性。专家评估表明摘要的平均主题相似度为0.775,PEM的为0.715。LSA与文本相似度专家评估之间的皮尔逊系数,摘要为0.598,PEM为 -0.167。相似性相关性的分段分析显示,450字以下的相关性为0.48(p = 0.02),450字以上的相关性为 -0.20(p = 0.43)。 结论:GPT-4.0显著提高了医学文本的可读性,主要保持了内容完整性,LSA和专家评估证实了这一点。LSA成为评估中等长度文本内容保真度的可靠工具,但对于更长的文档其效用降低,会高估相似度。这些发现支持了人工智能在应对低健康素养方面的潜力,然而,相似性分数表明专家验证至关重要。未来的研究必须努力提高转换精度并开发验证方法。
Int J Med Inform. 2025-3
Medicine (Baltimore). 2025-1-10
JB JS Open Access. 2025-1-8
Am J Ophthalmol. 2024-9
Medicine (Baltimore). 2024-6-21
J Am Pharm Assoc (2003). 2025
Medicine (Baltimore). 2025-1-10
Healthcare (Basel). 2024-12-31