• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用大语言模型从多机构临床记录中自动提取言语和行动能力的功能生物标志物。

Automated extraction of functional biomarkers of verbal and ambulatory ability from multi-institutional clinical notes using large language models.

作者信息

Kaster Levi, Hillis Ethan, Oh Inez Y, Aravamuthan Bhooma R, Lanzotti Virginia C, Vickstrom Casey R, Gurnett Christina A, Payne Philip R O, Gupta Aditi

机构信息

Institute for Informatics, Data Science and Biostatistics, Washington University School of Medicine in St. Louis, St. Louis, MO, USA.

Department of Neurology, Washington University School of Medicine in St. Louis, St. Louis, MO, USA.

出版信息

J Neurodev Disord. 2025 Apr 30;17(1):24. doi: 10.1186/s11689-025-09612-w.

DOI:10.1186/s11689-025-09612-w
PMID:40307685
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12042395/
Abstract

BACKGROUND

Functional biomarkers in neurodevelopmental disorders, such as verbal and ambulatory abilities, are essential for clinical care and research activities. Treatment planning, intervention monitoring, and identifying comorbid conditions in individuals with intellectual and developmental disabilities (IDDs) rely on standardized assessments of these abilities. However, traditional assessments impose a burden on patients and providers, often leading to longitudinal inconsistencies and inequities due to evolving guidelines and associated time-cost. Therefore, this study aimed to develop an automated approach to classify verbal and ambulatory abilities from EHR data of IDD and cerebral palsy (CP) patients. Application of large language models (LLMs) to clinical notes, which are rich in longitudinal data, may provide a low-burden pipeline for extracting functional biomarkers efficiently and accurately.

METHODS

Data from the multi-institutional National Brain Gene Registry (BGR) and a CP clinic cohort were utilized, comprising 3,245 notes from 125 individuals and 5,462 clinical notes from 260 individuals, respectively. Employing three LLMs-GPT-3.5 Turbo, GPT-4 Turbo, and GPT-4 Omni-we provided the models with a clinical note and utilized a detailed conversational format to prompt the models to answer: "Does the individual use any words?" and "Can the individual walk without aid?" These responses were evaluated against ground-truth abilities, which were established using neurobehavioral assessments collected for each dataset.

RESULTS

LLM pipelines demonstrated high accuracy (weighted-F1 scores > .90) in predicting ambulatory ability for both cohorts, likely due to the consistent use of Gross Motor Functional Classification System (GMFCS) as a consistent ground-truth standard. However, verbal ability predictions were more accurate in the BGR cohort, likely due to higher adherence between the prompt and ground-truth assessment questions. While LLMs can be computationally expensive, analysis of our protocol affirmed the cost effectiveness when applied to select notes from the EHR.

CONCLUSIONS

LLMs are effective at extracting functional biomarkers from EHR data and broadly generalizable across variable note-taking practices and institutions. Individual verbal and ambulatory ability were accurately extracted, supporting the method's ability to streamline workflows by offering automated, efficient data extraction for patient care and research. Future studies are needed to extend this methodology to additional populations and to demonstrate more granular functional data classification.

摘要

背景

神经发育障碍中的功能生物标志物,如言语和行走能力,对于临床护理和研究活动至关重要。智力和发育障碍(IDD)患者的治疗计划、干预监测以及共病情况的识别依赖于对这些能力的标准化评估。然而,传统评估给患者和医护人员带来负担,由于指南不断演变以及相关的时间成本,常常导致纵向不一致和不公平。因此,本研究旨在开发一种自动化方法,从IDD和脑瘫(CP)患者的电子健康记录(EHR)数据中对言语和行走能力进行分类。将大语言模型(LLM)应用于富含纵向数据的临床记录,可能为高效、准确地提取功能生物标志物提供一种低负担的途径。

方法

利用来自多机构的国家脑基因登记处(BGR)和一个CP临床队列的数据,分别包括来自125名个体的3245份记录和来自260名个体的5462份临床记录。我们使用三个LLM——GPT-3.5 Turbo、GPT-4 Turbo和GPT-4 Omni——为模型提供一份临床记录,并采用详细的对话形式促使模型回答:“该个体是否使用任何词语?”以及“该个体能否独立行走?”将这些回答与通过为每个数据集收集的神经行为评估确定的真实能力进行比较。

结果

LLM流程在预测两个队列的行走能力方面显示出高准确率(加权F1分数>.90),这可能是由于一致使用粗大运动功能分类系统(GMFCS)作为一致的真实标准。然而,言语能力预测在BGR队列中更准确,这可能是由于提示与真实评估问题之间的更高一致性。虽然LLM计算成本可能很高,但对我们方案的分析证实,当应用于从EHR中选择的记录时具有成本效益。

结论

LLM在从EHR数据中提取功能生物标志物方面有效,并且在不同的记录方式和机构中具有广泛的通用性。个体的言语和行走能力被准确提取,支持该方法通过为患者护理和研究提供自动化、高效的数据提取来简化工作流程的能力。未来需要开展研究,将这种方法扩展到更多人群,并展示更细致的功能数据分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5834/12042395/f08dc633cccc/11689_2025_9612_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5834/12042395/5ec0f5e92b64/11689_2025_9612_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5834/12042395/1957817a83e4/11689_2025_9612_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5834/12042395/e8f8ae253c2c/11689_2025_9612_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5834/12042395/f08dc633cccc/11689_2025_9612_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5834/12042395/5ec0f5e92b64/11689_2025_9612_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5834/12042395/1957817a83e4/11689_2025_9612_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5834/12042395/e8f8ae253c2c/11689_2025_9612_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5834/12042395/f08dc633cccc/11689_2025_9612_Fig4_HTML.jpg

相似文献

1
Automated extraction of functional biomarkers of verbal and ambulatory ability from multi-institutional clinical notes using large language models.使用大语言模型从多机构临床记录中自动提取言语和行动能力的功能生物标志物。
J Neurodev Disord. 2025 Apr 30;17(1):24. doi: 10.1186/s11689-025-09612-w.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
The potential of Generative Pre-trained Transformer 4 (GPT-4) to analyse medical notes in three different languages: a retrospective model-evaluation study.生成式预训练变换器4(GPT-4)分析三种不同语言医学笔记的潜力:一项回顾性模型评估研究。
Lancet Digit Health. 2025 Jan;7(1):e35-e43. doi: 10.1016/S2589-7500(24)00246-2.
4
Improving Large Language Models' Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation.通过在出院小结中添加重点内容提高大语言模型的总结准确性:比较评估
JMIR Med Inform. 2025 Jul 24;13:e66476. doi: 10.2196/66476.
5
Large Language Model Symptom Identification From Clinical Text: Multicenter Study.基于临床文本的大语言模型症状识别:多中心研究。
J Med Internet Res. 2025 Jul 31;27:e72984. doi: 10.2196/72984.
6
Sexual Harassment and Prevention Training性骚扰与预防培训
7
Automated devices for identifying peripheral arterial disease in people with leg ulceration: an evidence synthesis and cost-effectiveness analysis.用于识别下肢溃疡患者外周动脉疾病的自动化设备:证据综合和成本效益分析。
Health Technol Assess. 2024 Aug;28(37):1-158. doi: 10.3310/TWCG3912.
8
MarkVCID cerebral small vessel consortium: I. Enrollment, clinical, fluid protocols.马克 VCID 脑小血管联盟:一、入组、临床、液体方案。
Alzheimers Dement. 2021 Apr;17(4):704-715. doi: 10.1002/alz.12215. Epub 2021 Jan 21.
9
Short-Term Memory Impairment短期记忆障碍
10
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

本文引用的文献

1
Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: a performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy's rule-based and machine learning-based methods.利用GPT-4在电子健康记录中识别癌症表型:GPT-4、GPT-3.5-turbo、Flan-T5、Llama-3-8B与spaCy基于规则和基于机器学习的方法之间的性能比较。
JAMIA Open. 2024 Jul 3;7(3):ooae060. doi: 10.1093/jamiaopen/ooae060. eCollection 2024 Oct.
2
A critical assessment of using ChatGPT for extracting structured data from clinical notes.对使用ChatGPT从临床记录中提取结构化数据的批判性评估。
NPJ Digit Med. 2024 May 1;7(1):106. doi: 10.1038/s41746-024-01079-8.
3
A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?
GPT-3.5、GPT-4和GPT-4V之间的比较:大型语言模型(ChatGPT)能通过日本骨科手术委员会考试吗?
Cureus. 2024 Mar 18;16(3):e56402. doi: 10.7759/cureus.56402. eCollection 2024 Mar.
4
The Brain Gene Registry: a data snapshot.大脑基因登记处:数据快照。
J Neurodev Disord. 2024 Apr 17;16(1):17. doi: 10.1186/s11689-024-09530-3.
5
Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study.GPT-3.5 和 GPT-4 与医学生在书面德语文凭考试中的表现比较:观察性研究。
JMIR Med Educ. 2024 Feb 8;10:e50965. doi: 10.2196/50965.
6
Clinical variants paired with phenotype: A rich resource for brain gene curation.临床变异与表型配对:大脑基因整理的丰富资源。
Genet Med. 2024 Mar;26(3):101035. doi: 10.1016/j.gim.2023.101035. Epub 2023 Dec 4.
7
Extracting cancer concepts from clinical notes using natural language processing: a systematic review.使用自然语言处理从临床笔记中提取癌症概念:系统评价。
BMC Bioinformatics. 2023 Oct 29;24(1):405. doi: 10.1186/s12859-023-05480-0.
8
Extraction of clinical phenotypes for Alzheimer's disease dementia from clinical notes using natural language processing.使用自然语言处理技术从临床记录中提取阿尔茨海默病痴呆的临床表型。
JAMIA Open. 2023 Feb 24;6(1):ooad014. doi: 10.1093/jamiaopen/ooad014. eCollection 2023 Apr.
9
Implementation of Collection of Patients' Disability Status by Centralized Scheduling.集中调度采集患者残疾状况的实施。
Jt Comm J Qual Patient Saf. 2021 Oct;47(10):627-636. doi: 10.1016/j.jcjq.2021.05.007. Epub 2021 May 24.
10
Intellectual and developmental disabilities research centers: Fifty years of scientific accomplishments.智力和发育障碍研究中心:五十年的科学成就。
Ann Neurol. 2019 Sep;86(3):332-343. doi: 10.1002/ana.25531. Epub 2019 Jul 27.