Suppr超能文献

使用微调的大语言模型评估电子健康记录中的改良Rankin量表。

Assessment of the Modified Rankin Scale in Electronic Health Records with a Fine-tuned Large Language Model.

作者信息

Silva Luis, Milani Marcus, Bindra Sohum, Ikramuddin Salman, Tessmer Megan, Frederickson Kaylee, Datta Abhigyan, Ergen Halil, Stangebye Alex, Cooper Dawson, Kumar Kompal, Yeung Jeremy, Lakshminarayan Kamakshi, Streib Christopher D

机构信息

Department of Neurology, University of Minnesota, Minneapolis, Minnesota, United States of America.

Department of Neurology, University of Florida, Gainesville, Florida, United States of America.

出版信息

medRxiv. 2025 May 2:2025.04.30.25326777. doi: 10.1101/2025.04.30.25326777.

Abstract

INTRODUCTION

The modified Rankin scale (mRS) is an important metric in stroke research, often used as a primary outcome in clinical trials and observational studies. The mRS can be assessed retrospectively from electronic health records (EHR), though this process is labor-intensive and prone to inter-rater variability. Large language models (LLMs) have demonstrated potential in automating clinical text classification. We hypothesize that a fine-tuned LLM can analyze EHR text and classify mRS scores for clinical and research applications.

METHODS

We performed a retrospective cohort study of patients admitted to a specialist stroke neurology service at a large academic hospital system between August 2020 and June 2023. Each patient's medical record was reviewed at two time points: (1) hospital discharge and (2) approximately 90 days post-discharge. Two independent researchers assigned an mRS score at each time point. Two separate models were trained on EHR passages with corresponding mRS scores as labeled outcomes: (1) a multiclass model to classify all seven mRS scores and (2) a binary model to classify functional independence (mRS 0-2) versus non-independence (mRS 3-6). Four-fold cross-validation was conducted, using accuracy and Cohen's kappa as model performance metrics.

RESULTS

A total of 2,290 EHR passages with corresponding mRS scores were included in model training. The multiclass model-considering all seven scores of the mRS-attained an accuracy of 77% and a weighted Cohen's Kappa of 0.92. Class-specific accuracy was highest for mRS 4 (90%) and lowest for mRS 2 (28%). The binary model-considering only functional independence vs non-independence -attained an accuracy of 92% and Cohen's Kappa of 0.84.

CONCLUSION

Our findings demonstrate that LLMs can be successfully trained to determine mRS scores through EHR text analysis. With further advancements, fully automated LLMs could scale across large clinical datasets, enabling data-driven public health strategies and optimized resource allocation.

摘要

引言

改良Rankin量表(mRS)是卒中研究中的一项重要指标,常用于临床试验和观察性研究的主要结局。mRS可以从电子健康记录(EHR)中进行回顾性评估,不过这个过程劳动强度大,且容易出现评分者间的差异。大语言模型(LLM)已在临床文本分类自动化方面展现出潜力。我们假设,经过微调的LLM能够分析EHR文本并为临床和研究应用对mRS评分进行分类。

方法

我们对2020年8月至2023年6月期间在一家大型学术医院系统的专科卒中神经科就诊的患者进行了一项回顾性队列研究。在两个时间点对每位患者的病历进行审查:(1)出院时和(2)出院后约90天。两名独立研究人员在每个时间点分配一个mRS评分。在带有相应mRS评分作为标记结局的EHR段落上训练两个单独的模型:(1)一个多类模型,用于对所有七个mRS评分进行分类;(2)一个二元模型,用于对功能独立(mRS 0 - 2)与非独立(mRS 3 - 6)进行分类。使用准确率和科恩kappa系数作为模型性能指标进行了四折交叉验证。

结果

模型训练共纳入了2290条带有相应mRS评分的EHR段落。考虑mRS所有七个评分的多类模型的准确率为77%,加权科恩kappa系数为0.92。mRS 4的类别特异性准确率最高(90%),mRS 2的最低(28%)。仅考虑功能独立与非独立的二元模型的准确率为92%,科恩kappa系数为0.84。

结论

我们的研究结果表明,LLM可以通过EHR文本分析成功训练以确定mRS评分。随着进一步发展,完全自动化的LLM可以扩展到大型临床数据集,实现数据驱动的公共卫生策略和优化资源分配。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e9c/12060943/95d0947aee54/nihpp-2025.04.30.25326777v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验