评估人工智能对医学文本的简化：可读性与内容保真度。

Assessing AI Simplification of Medical Texts: Readability and Content Fidelity.

作者信息

Picton Bryce, Andalib Saman, Spina Aidin, Camp Brandon, Solomon Sean S, Liang Jason, Chen Patrick M, Chen Jefferson W, Hsu Frank P, Oh Michael Y

机构信息

Department of Neurological Surgery, University of California, Irvine, Orange, CA, USA.

School of Medicine, University of California, Irvine, Orange, CA, USA.

出版信息

Int J Med Inform. 2025 Mar;195:105743. doi: 10.1016/j.ijmedinf.2024.105743. Epub 2024 Dec 1.

DOI:10.1016/j.ijmedinf.2024.105743

PMID:39667051

Abstract

INTRODUCTION

The escalating complexity of medical literature necessitates tools to enhance readability for patients. This study aimed to evaluate the efficacy of ChatGPT-4 in simplifying neurology and neurosurgical abstracts and patient education materials (PEMs) while assessing content preservation using Latent Semantic Analysis (LSA).

METHODS

A total of 100 abstracts (25 each from Neurosurgery, Journal of Neurosurgery, Lancet Neurology, and JAMA Neurology) and 340 PEMs (66 from the American Association of Neurological Surgeons, 274 from the American Academy of Neurology) were transformed by a GPT-4.0 prompt requesting a 5th grade reading level. Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FKRE) scores were used before/after transformation. Content fidelity was validated via LSA (ranging 0-1, 1 meaning identical topics) and by expert assessment (0-1) for a subset (n = 40). Pearson correlation coefficient compared assessments.

RESULTS

FKGL decreased from 12th to 5th grade for abstracts and 13th to 5th for PEMs (p < 0.001). FKRE scores showed similar improvement (p < 0.001). LSA confirmed high content similarity for abstracts (mean cosine similarity 0.746) and PEMs (mean 0.953). Expert assessment indicated a mean topic similarity of 0.775 for abstracts and 0.715 for PEMs. The Pearson coefficient between LSA and expert assessment of textual similarity was 0.598 for abstracts and -0.167 for PEMs. Segmented analysis of similarity correlations revealed a correlation of 0.48 (p = 0.02) below 450 words and a -0.20 (p = 0.43) correlation above 450 words.

CONCLUSION

GPT-4.0 markedly improved the readability of medical texts, predominantly maintaining content integrity as substantiated by LSA and expert evaluations. LSA emerged as a reliable tool for assessing content fidelity within moderate-length texts, but its utility diminished for longer documents, overestimating similarity. These findings support the potential of AI in combating low health literacy, however, the similarity scores indicate expert validation is crucial. Future research must strive to improve transformation precision and develop validation methodologies.

摘要

引言

医学文献的复杂性不断升级，因此需要一些工具来提高患者对其的可读性。本研究旨在评估ChatGPT-4在简化神经病学和神经外科摘要以及患者教育材料（PEM）方面的效果，同时使用潜在语义分析（LSA）评估内容保留情况。

方法

总共100篇摘要（神经外科、《神经外科杂志》《柳叶刀神经病学》和《美国医学会神经病学杂志》各25篇）和340份PEM（美国神经外科医师协会的66份，美国神经病学学会的274份）通过GPT-4.0提示进行转换，要求达到五年级阅读水平。在转换前后使用弗莱什-金凯德年级水平（FKGL）和弗莱什阅读简易度（FKRE）分数。通过LSA（范围为0至1，1表示主题相同）和对一个子集（n = 40）的专家评估（0至1）来验证内容保真度。使用皮尔逊相关系数比较评估结果。

结果

摘要的FKGL从12年级降至5年级，PEM的从13年级降至5年级（p < 0.001）。FKRE分数显示出类似的改善（p < 0.001）。LSA证实摘要（平均余弦相似度0.746）和PEM（平均0.953）具有高度的内容相似性。专家评估表明摘要的平均主题相似度为0.775，PEM的为0.715。LSA与文本相似度专家评估之间的皮尔逊系数，摘要为0.598，PEM为 -0.167。相似性相关性的分段分析显示，450字以下的相关性为0.48（p = 0.02），450字以上的相关性为 -0.20（p = 0.43）。

结论

GPT-4.0显著提高了医学文本的可读性，主要保持了内容完整性，LSA和专家评估证实了这一点。LSA成为评估中等长度文本内容保真度的可靠工具，但对于更长的文档其效用降低，会高估相似度。这些发现支持了人工智能在应对低健康素养方面的潜力，然而，相似性分数表明专家验证至关重要。未来的研究必须努力提高转换精度并开发验证方法。

相似文献

Assessing AI Simplification of Medical Texts: Readability and Content Fidelity.

Int J Med Inform. 2025 Mar;195:105743. doi: 10.1016/j.ijmedinf.2024.105743. Epub 2024 Dec 1.

Tailoring glaucoma education using large language models: Addressing health disparities in patient comprehension.

Medicine (Baltimore). 2025 Jan 10;104(2):e41059. doi: 10.1097/MD.0000000000041059.

Evaluation of Generative Language Models in Personalizing Medical Information: Instrument Validation Study.

JMIR AI. 2024 Aug 13;3:e54371. doi: 10.2196/54371.

Source Characteristics Influence AI-Enabled Orthopaedic Text Simplification: Recommendations for the Future.

JB JS Open Access. 2025 Jan 8;10(1). doi: 10.2106/JBJS.OA.24.00007. eCollection 2025 Jan-Mar.

Can Artificial Intelligence Improve the Readability of Patient Education Materials on Aortic Stenosis? A Pilot Study.

Cardiol Ther. 2024 Mar;13(1):137-147. doi: 10.1007/s40119-023-00347-0. Epub 2024 Jan 9.

Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.

JMIR Dermatol. 2024 May 16;7:e55898. doi: 10.2196/55898.

Using Large Language Models to Generate Educational Materials on Childhood Glaucoma.

Am J Ophthalmol. 2024 Sep;265:28-38. doi: 10.1016/j.ajo.2024.04.004. Epub 2024 Apr 16.

Optimizing Ophthalmology Patient Education via ChatBot-Generated Materials: Readability Analysis of AI-Generated Patient Education Materials and The American Society of Ophthalmic Plastic and Reconstructive Surgery Patient Brochures.

Ophthalmic Plast Reconstr Surg. 2024;40(2):212-216. doi: 10.1097/IOP.0000000000002549. Epub 2023 Nov 16.

Assessing parental comprehension of online resources on childhood pain.

Medicine (Baltimore). 2024 Jun 21;103(25):e38569. doi: 10.1097/MD.0000000000038569.

Unlocking the future of patient Education: ChatGPT vs. LexiComp® as sources of patient education materials.

J Am Pharm Assoc (2003). 2025 Jan-Feb;65(1):102119. doi: 10.1016/j.japh.2024.102119. Epub 2024 May 8.

引用本文的文献

ChatGPT-4o Compared With Human Researchers in Writing Plain-Language Summaries for Cochrane Reviews: A Blinded, Randomized Non-Inferiority Controlled Trial.

Cochrane Evid Synth Methods. 2025 Jul 28;3(4):e70037. doi: 10.1002/cesm.70037. eCollection 2025 Jul.

Using AI to Translate and Simplify Spanish Orthopedic Medical Text: Instrument Validation Study.

JMIR AI. 2025 Mar 21;4:e70222. doi: 10.2196/70222.

A structured evaluation of LLM-generated step-by-step instructions in cadaveric brachial plexus dissection.

BMC Med Educ. 2025 Jul 1;25(1):903. doi: 10.1186/s12909-025-07493-0.

Tailoring glaucoma education using large language models: Addressing health disparities in patient comprehension.

Medicine (Baltimore). 2025 Jan 10;104(2):e41059. doi: 10.1097/MD.0000000000041059.

Enhancing Patient Comprehension of Glomerular Disease Treatments Using ChatGPT.

Healthcare (Basel). 2024 Dec 31;13(1):57. doi: 10.3390/healthcare13010057.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估人工智能对医学文本的简化：可读性与内容保真度。

Assessing AI Simplification of Medical Texts: Readability and Content Fidelity.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

CONCLUSION

引言

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献