使用微调的大语言模型评估电子健康记录中的改良Rankin量表。

Assessment of the Modified Rankin Scale in Electronic Health Records with a Fine-tuned Large Language Model.

作者信息

Silva Luis, Milani Marcus, Bindra Sohum, Ikramuddin Salman, Tessmer Megan, Frederickson Kaylee, Datta Abhigyan, Ergen Halil, Stangebye Alex, Cooper Dawson, Kumar Kompal, Yeung Jeremy, Lakshminarayan Kamakshi, Streib Christopher D

机构信息

Department of Neurology, University of Minnesota, Minneapolis, Minnesota, United States of America.

Department of Neurology, University of Florida, Gainesville, Florida, United States of America.

出版信息

medRxiv. 2025 May 2:2025.04.30.25326777. doi: 10.1101/2025.04.30.25326777.

DOI:10.1101/2025.04.30.25326777

PMID:40343036

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12060943/

Abstract

INTRODUCTION

The modified Rankin scale (mRS) is an important metric in stroke research, often used as a primary outcome in clinical trials and observational studies. The mRS can be assessed retrospectively from electronic health records (EHR), though this process is labor-intensive and prone to inter-rater variability. Large language models (LLMs) have demonstrated potential in automating clinical text classification. We hypothesize that a fine-tuned LLM can analyze EHR text and classify mRS scores for clinical and research applications.

METHODS

We performed a retrospective cohort study of patients admitted to a specialist stroke neurology service at a large academic hospital system between August 2020 and June 2023. Each patient's medical record was reviewed at two time points: (1) hospital discharge and (2) approximately 90 days post-discharge. Two independent researchers assigned an mRS score at each time point. Two separate models were trained on EHR passages with corresponding mRS scores as labeled outcomes: (1) a multiclass model to classify all seven mRS scores and (2) a binary model to classify functional independence (mRS 0-2) versus non-independence (mRS 3-6). Four-fold cross-validation was conducted, using accuracy and Cohen's kappa as model performance metrics.

RESULTS

A total of 2,290 EHR passages with corresponding mRS scores were included in model training. The multiclass model-considering all seven scores of the mRS-attained an accuracy of 77% and a weighted Cohen's Kappa of 0.92. Class-specific accuracy was highest for mRS 4 (90%) and lowest for mRS 2 (28%). The binary model-considering only functional independence vs non-independence -attained an accuracy of 92% and Cohen's Kappa of 0.84.

CONCLUSION

Our findings demonstrate that LLMs can be successfully trained to determine mRS scores through EHR text analysis. With further advancements, fully automated LLMs could scale across large clinical datasets, enabling data-driven public health strategies and optimized resource allocation.

摘要

引言

改良Rankin量表（mRS）是卒中研究中的一项重要指标，常用于临床试验和观察性研究的主要结局。mRS可以从电子健康记录（EHR）中进行回顾性评估，不过这个过程劳动强度大，且容易出现评分者间的差异。大语言模型（LLM）已在临床文本分类自动化方面展现出潜力。我们假设，经过微调的LLM能够分析EHR文本并为临床和研究应用对mRS评分进行分类。

方法

我们对2020年8月至2023年6月期间在一家大型学术医院系统的专科卒中神经科就诊的患者进行了一项回顾性队列研究。在两个时间点对每位患者的病历进行审查：（1）出院时和（2）出院后约90天。两名独立研究人员在每个时间点分配一个mRS评分。在带有相应mRS评分作为标记结局的EHR段落上训练两个单独的模型：（1）一个多类模型，用于对所有七个mRS评分进行分类；（2）一个二元模型，用于对功能独立（mRS 0 - 2）与非独立（mRS 3 - 6）进行分类。使用准确率和科恩kappa系数作为模型性能指标进行了四折交叉验证。

结果

模型训练共纳入了2290条带有相应mRS评分的EHR段落。考虑mRS所有七个评分的多类模型的准确率为77%，加权科恩kappa系数为0.92。mRS 4的类别特异性准确率最高（90%），mRS 2的最低（28%）。仅考虑功能独立与非独立的二元模型的准确率为92%，科恩kappa系数为0.84。

结论

我们的研究结果表明，LLM可以通过EHR文本分析成功训练以确定mRS评分。随着进一步发展，完全自动化的LLM可以扩展到大型临床数据集，实现数据驱动的公共卫生策略和优化资源分配。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e9c/12060943/95d0947aee54/nihpp-2025.04.30.25326777v1-f0001.jpg

相似文献

Assessment of the Modified Rankin Scale in Electronic Health Records with a Fine-tuned Large Language Model.

medRxiv. 2025 May 2:2025.04.30.25326777. doi: 10.1101/2025.04.30.25326777.

Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study.

JMIR Med Inform. 2025 Jan 21;13:e65454. doi: 10.2196/65454.

Initial testing of an electronic application of the simplified modified Rankin Scale questionnaire (e-smRSq).

J Stroke Cerebrovasc Dis. 2020 Sep;29(9):105024. doi: 10.1016/j.jstrokecerebrovasdis.2020.105024. Epub 2020 Jun 20.

Automated extraction of post-stroke functional outcomes from unstructured electronic health records.

Eur Stroke J. 2025 Jan 22:23969873251314340. doi: 10.1177/23969873251314340.

Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.

JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.

Validation of a German-language modified Rankin Scale structured telephone interview at 3 months in a real-life stroke cohort.

Neurol Res Pract. 2023 Nov 30;5(1):59. doi: 10.1186/s42466-023-00289-x.

Leveraging large language models to mimic domain expert labeling in unstructured text-based electronic healthcare records in non-english languages.

BMC Med Inform Decis Mak. 2025 Mar 31;25(1):154. doi: 10.1186/s12911-025-02871-6.

Classification of neurologic outcomes from medical notes using natural language processing.

Expert Syst Appl. 2023 Mar 15;214. doi: 10.1016/j.eswa.2022.119171. Epub 2022 Nov 6.

Large Language Model-Based Assessment of Clinical Reasoning Documentation in the Electronic Health Record Across Two Institutions: Development and Validation Study.

J Med Internet Res. 2025 Mar 21;27:e67967. doi: 10.2196/67967.

Multimodal machine learning for predicting perioperative safety indicators in spinal surgery.

Spine J. 2025 Mar 29. doi: 10.1016/j.spinee.2025.03.021.

本文引用的文献

Fine-tuning large language models for rare disease concept normalization.

J Am Med Inform Assoc. 2024 Sep 1;31(9):2076-2083. doi: 10.1093/jamia/ocae133.

GPT-4 Performance for Neurologic Localization.

Neurol Clin Pract. 2024 Jun;14(3):e200293. doi: 10.1212/CPJ.0000000000200293. Epub 2024 Mar 27.

Performance of Large Language Models on a Neurology Board-Style Examination.

JAMA Netw Open. 2023 Dec 1;6(12):e2346721. doi: 10.1001/jamanetworkopen.2023.46721.

Chat GPT as a Neuro-Score Calculator: Analysis of a Large Language Model's Performance on Various Neurological Exam Grading Scales.

World Neurosurg. 2023 Nov;179:e342-e347. doi: 10.1016/j.wneu.2023.08.088. Epub 2023 Aug 26.

Mechanical Thrombectomy for Large Ischemic Stroke: A Systematic Review and Meta-analysis.

Neurology. 2023 Aug 29;101(9):e922-e932. doi: 10.1212/WNL.0000000000207536. Epub 2023 Jun 5.

Classification of neurologic outcomes from medical notes using natural language processing.

Expert Syst Appl. 2023 Mar 15;214. doi: 10.1016/j.eswa.2022.119171. Epub 2022 Nov 6.

Tenecteplase versus alteplase before mechanical thrombectomy: experience from a US healthcare system undergoing a system-wide transition of primary thrombolytic.

J Neurointerv Surg. 2023 Nov;15(e2):e277-e281. doi: 10.1136/jnis-2022-019662. Epub 2022 Nov 22.

Deriving Place of Residence, Modified Rankin Scale, and EuroQol-5D Scores from the Medical Record for Stroke Survivors.

Cerebrovasc Dis. 2021;50(5):567-573. doi: 10.1159/000516571. Epub 2021 Jun 9.

Evolution of the Modified Rankin Scale and Its Use in Future Stroke Trials.

Stroke. 2017 Jul;48(7):2007-2012. doi: 10.1161/STROKEAHA.117.017866. Epub 2017 Jun 16.

A randomized trial of intraarterial treatment for acute ischemic stroke.

N Engl J Med. 2015 Jan 1;372(1):11-20. doi: 10.1056/NEJMoa1411587. Epub 2014 Dec 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用微调的大语言模型评估电子健康记录中的改良Rankin量表。

Assessment of the Modified Rankin Scale in Electronic Health Records with a Fine-tuned Large Language Model.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

CONCLUSION

引言

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献