• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用微调的大语言模型评估电子健康记录中的改良Rankin量表。

Assessment of the Modified Rankin Scale in Electronic Health Records with a Fine-tuned Large Language Model.

作者信息

Silva Luis, Milani Marcus, Bindra Sohum, Ikramuddin Salman, Tessmer Megan, Frederickson Kaylee, Datta Abhigyan, Ergen Halil, Stangebye Alex, Cooper Dawson, Kumar Kompal, Yeung Jeremy, Lakshminarayan Kamakshi, Streib Christopher D

机构信息

Department of Neurology, University of Minnesota, Minneapolis, Minnesota, United States of America.

Department of Neurology, University of Florida, Gainesville, Florida, United States of America.

出版信息

medRxiv. 2025 May 2:2025.04.30.25326777. doi: 10.1101/2025.04.30.25326777.

DOI:10.1101/2025.04.30.25326777
PMID:40343036
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12060943/
Abstract

INTRODUCTION

The modified Rankin scale (mRS) is an important metric in stroke research, often used as a primary outcome in clinical trials and observational studies. The mRS can be assessed retrospectively from electronic health records (EHR), though this process is labor-intensive and prone to inter-rater variability. Large language models (LLMs) have demonstrated potential in automating clinical text classification. We hypothesize that a fine-tuned LLM can analyze EHR text and classify mRS scores for clinical and research applications.

METHODS

We performed a retrospective cohort study of patients admitted to a specialist stroke neurology service at a large academic hospital system between August 2020 and June 2023. Each patient's medical record was reviewed at two time points: (1) hospital discharge and (2) approximately 90 days post-discharge. Two independent researchers assigned an mRS score at each time point. Two separate models were trained on EHR passages with corresponding mRS scores as labeled outcomes: (1) a multiclass model to classify all seven mRS scores and (2) a binary model to classify functional independence (mRS 0-2) versus non-independence (mRS 3-6). Four-fold cross-validation was conducted, using accuracy and Cohen's kappa as model performance metrics.

RESULTS

A total of 2,290 EHR passages with corresponding mRS scores were included in model training. The multiclass model-considering all seven scores of the mRS-attained an accuracy of 77% and a weighted Cohen's Kappa of 0.92. Class-specific accuracy was highest for mRS 4 (90%) and lowest for mRS 2 (28%). The binary model-considering only functional independence vs non-independence -attained an accuracy of 92% and Cohen's Kappa of 0.84.

CONCLUSION

Our findings demonstrate that LLMs can be successfully trained to determine mRS scores through EHR text analysis. With further advancements, fully automated LLMs could scale across large clinical datasets, enabling data-driven public health strategies and optimized resource allocation.

摘要

引言

改良Rankin量表(mRS)是卒中研究中的一项重要指标,常用于临床试验和观察性研究的主要结局。mRS可以从电子健康记录(EHR)中进行回顾性评估,不过这个过程劳动强度大,且容易出现评分者间的差异。大语言模型(LLM)已在临床文本分类自动化方面展现出潜力。我们假设,经过微调的LLM能够分析EHR文本并为临床和研究应用对mRS评分进行分类。

方法

我们对2020年8月至2023年6月期间在一家大型学术医院系统的专科卒中神经科就诊的患者进行了一项回顾性队列研究。在两个时间点对每位患者的病历进行审查:(1)出院时和(2)出院后约90天。两名独立研究人员在每个时间点分配一个mRS评分。在带有相应mRS评分作为标记结局的EHR段落上训练两个单独的模型:(1)一个多类模型,用于对所有七个mRS评分进行分类;(2)一个二元模型,用于对功能独立(mRS 0 - 2)与非独立(mRS 3 - 6)进行分类。使用准确率和科恩kappa系数作为模型性能指标进行了四折交叉验证。

结果

模型训练共纳入了2290条带有相应mRS评分的EHR段落。考虑mRS所有七个评分的多类模型的准确率为77%,加权科恩kappa系数为0.92。mRS 4的类别特异性准确率最高(90%),mRS 2的最低(28%)。仅考虑功能独立与非独立的二元模型的准确率为92%,科恩kappa系数为0.84。

结论

我们的研究结果表明,LLM可以通过EHR文本分析成功训练以确定mRS评分。随着进一步发展,完全自动化的LLM可以扩展到大型临床数据集,实现数据驱动的公共卫生策略和优化资源分配。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e9c/12060943/0045a27ebe5c/nihpp-2025.04.30.25326777v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e9c/12060943/95d0947aee54/nihpp-2025.04.30.25326777v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e9c/12060943/a12f0015a1c1/nihpp-2025.04.30.25326777v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e9c/12060943/d1206fa8bedf/nihpp-2025.04.30.25326777v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e9c/12060943/0045a27ebe5c/nihpp-2025.04.30.25326777v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e9c/12060943/95d0947aee54/nihpp-2025.04.30.25326777v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e9c/12060943/a12f0015a1c1/nihpp-2025.04.30.25326777v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e9c/12060943/d1206fa8bedf/nihpp-2025.04.30.25326777v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e9c/12060943/0045a27ebe5c/nihpp-2025.04.30.25326777v1-f0004.jpg

相似文献

1
Assessment of the Modified Rankin Scale in Electronic Health Records with a Fine-tuned Large Language Model.使用微调的大语言模型评估电子健康记录中的改良Rankin量表。
medRxiv. 2025 May 2:2025.04.30.25326777. doi: 10.1101/2025.04.30.25326777.
2
Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study.用于心理健康预测模型的电子健康记录中非结构化文本分类:大语言模型评估研究
JMIR Med Inform. 2025 Jan 21;13:e65454. doi: 10.2196/65454.
3
Initial testing of an electronic application of the simplified modified Rankin Scale questionnaire (e-smRSq).简化改良Rankin量表问卷(e-smRSq)电子应用程序的初步测试。
J Stroke Cerebrovasc Dis. 2020 Sep;29(9):105024. doi: 10.1016/j.jstrokecerebrovasdis.2020.105024. Epub 2020 Jun 20.
4
Automated extraction of post-stroke functional outcomes from unstructured electronic health records.从非结构化电子健康记录中自动提取中风后功能结局
Eur Stroke J. 2025 Jan 22:23969873251314340. doi: 10.1177/23969873251314340.
5
Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.用于肿瘤学健康信息提取的大语言模型应用:范围综述
JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.
6
Validation of a German-language modified Rankin Scale structured telephone interview at 3 months in a real-life stroke cohort.在一个真实的卒中队列中,对德语版改良Rankin量表结构化电话访谈在3个月时的效度进行验证。
Neurol Res Pract. 2023 Nov 30;5(1):59. doi: 10.1186/s42466-023-00289-x.
7
Leveraging large language models to mimic domain expert labeling in unstructured text-based electronic healthcare records in non-english languages.利用大语言模型在非英语的基于文本的非结构化电子健康记录中模拟领域专家标注。
BMC Med Inform Decis Mak. 2025 Mar 31;25(1):154. doi: 10.1186/s12911-025-02871-6.
8
Classification of neurologic outcomes from medical notes using natural language processing.使用自然语言处理技术从医学记录中对神经学结果进行分类。
Expert Syst Appl. 2023 Mar 15;214. doi: 10.1016/j.eswa.2022.119171. Epub 2022 Nov 6.
9
Large Language Model-Based Assessment of Clinical Reasoning Documentation in the Electronic Health Record Across Two Institutions: Development and Validation Study.基于大语言模型对两个机构电子健康记录中临床推理文档的评估:开发与验证研究
J Med Internet Res. 2025 Mar 21;27:e67967. doi: 10.2196/67967.
10
Multimodal machine learning for predicting perioperative safety indicators in spinal surgery.用于预测脊柱手术围手术期安全指标的多模态机器学习
Spine J. 2025 Mar 29. doi: 10.1016/j.spinee.2025.03.021.

本文引用的文献

1
Fine-tuning large language models for rare disease concept normalization.微调大型语言模型以实现罕见病概念规范化。
J Am Med Inform Assoc. 2024 Sep 1;31(9):2076-2083. doi: 10.1093/jamia/ocae133.
2
GPT-4 Performance for Neurologic Localization.GPT-4在神经定位方面的表现。
Neurol Clin Pract. 2024 Jun;14(3):e200293. doi: 10.1212/CPJ.0000000000200293. Epub 2024 Mar 27.
3
Performance of Large Language Models on a Neurology Board-Style Examination.大语言模型在神经科 board-style 考试中的表现。
JAMA Netw Open. 2023 Dec 1;6(12):e2346721. doi: 10.1001/jamanetworkopen.2023.46721.
4
Chat GPT as a Neuro-Score Calculator: Analysis of a Large Language Model's Performance on Various Neurological Exam Grading Scales.Chat GPT作为神经评分计算器:大型语言模型在各种神经学检查评分量表上的性能分析。
World Neurosurg. 2023 Nov;179:e342-e347. doi: 10.1016/j.wneu.2023.08.088. Epub 2023 Aug 26.
5
Mechanical Thrombectomy for Large Ischemic Stroke: A Systematic Review and Meta-analysis.机械取栓治疗大面积缺血性脑卒中:系统评价和荟萃分析。
Neurology. 2023 Aug 29;101(9):e922-e932. doi: 10.1212/WNL.0000000000207536. Epub 2023 Jun 5.
6
Classification of neurologic outcomes from medical notes using natural language processing.使用自然语言处理技术从医学记录中对神经学结果进行分类。
Expert Syst Appl. 2023 Mar 15;214. doi: 10.1016/j.eswa.2022.119171. Epub 2022 Nov 6.
7
Tenecteplase versus alteplase before mechanical thrombectomy: experience from a US healthcare system undergoing a system-wide transition of primary thrombolytic.机械取栓术前替奈普酶与阿替普酶的比较:来自美国一个正在进行全系统初级溶栓药物转换的医疗系统的经验。
J Neurointerv Surg. 2023 Nov;15(e2):e277-e281. doi: 10.1136/jnis-2022-019662. Epub 2022 Nov 22.
8
Deriving Place of Residence, Modified Rankin Scale, and EuroQol-5D Scores from the Medical Record for Stroke Survivors.从脑卒中幸存者的病历中推导居住地点、改良 Rankin 量表和 EuroQol-5D 评分。
Cerebrovasc Dis. 2021;50(5):567-573. doi: 10.1159/000516571. Epub 2021 Jun 9.
9
Evolution of the Modified Rankin Scale and Its Use in Future Stroke Trials.改良Rankin量表的演变及其在未来卒中试验中的应用。
Stroke. 2017 Jul;48(7):2007-2012. doi: 10.1161/STROKEAHA.117.017866. Epub 2017 Jun 16.
10
A randomized trial of intraarterial treatment for acute ischemic stroke.急性缺血性脑卒中的动脉内治疗随机试验。
N Engl J Med. 2015 Jan 1;372(1):11-20. doi: 10.1056/NEJMoa1411587. Epub 2014 Dec 17.