• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用自然语言处理技术从临床记录中提取阿尔茨海默病痴呆的临床表型。

Extraction of clinical phenotypes for Alzheimer's disease dementia from clinical notes using natural language processing.

作者信息

Oh Inez Y, Schindler Suzanne E, Ghoshal Nupur, Lai Albert M, Payne Philip R O, Gupta Aditi

机构信息

Institute for Informatics, Washington University School of Medicine, St. Louis, Missouri, USA.

Department of Neurology, Washington University School of Medicine, St. Louis, Missouri, USA.

出版信息

JAMIA Open. 2023 Feb 24;6(1):ooad014. doi: 10.1093/jamiaopen/ooad014. eCollection 2023 Apr.

DOI:10.1093/jamiaopen/ooad014
PMID:36844369
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9952043/
Abstract

OBJECTIVES

There is much interest in utilizing clinical data for developing prediction models for Alzheimer's disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured electronic health record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR.

MATERIALS AND METHODS

We developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by 2 clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings.

RESULTS

Documentation rates for each phenotype varied in the structured versus unstructured EHR. Interannotator agreement was high (Cohen's kappa = 0.72-1) and positively correlated with the NLP-based phenotype extraction pipeline's performance (average F1-score = 0.65-0.99) for each phenotype.

DISCUSSION

We developed an automated NLP-based pipeline to extract informative phenotypes that may improve the performance of eventual machine learning predictive models for AD. In the process, we examined documentation practices for each phenotype relevant to the care of AD patients and identified factors for success.

CONCLUSION

Success of our NLP-based phenotype extraction pipeline depended on domain-specific knowledge and focus on a specific clinical domain instead of maximizing generalizability.

摘要

目的

利用临床数据来开发针对阿尔茨海默病(AD)风险、病情进展及预后的预测模型备受关注。现有研究大多利用经过整理的研究登记库、图像分析以及结构化电子健康记录(EHR)数据。然而,许多关键信息存在于电子健康记录中相对难以获取的非结构化临床笔记里。

材料与方法

我们开发了一种基于自然语言处理(NLP)的流程,用于提取与AD相关的临床表型,记录成功策略并评估挖掘非结构化临床笔记的效用。我们对照由2名临床痴呆症专家针对与AD相关的临床表型(包括医学合并症、生物标志物、神经行为测试分数、认知衰退的行为指标、家族史以及神经影像学检查结果)所进行的金标准人工注释,对该流程进行了评估。

结果

每种表型在结构化与非结构化电子健康记录中的记录率有所不同。注释者间一致性较高(科恩kappa系数=0.72 - 1),并且与基于NLP的表型提取流程针对每种表型的性能(平均F1分数=0.65 - 0.99)呈正相关。

讨论

我们开发了一种基于NLP的自动化流程,以提取可能改善最终用于AD的机器学习预测模型性能的信息性表型。在此过程中,我们检查了与AD患者护理相关的每种表型的记录实践,并确定了成功因素。

结论

我们基于NLP的表型提取流程的成功取决于特定领域的知识,并专注于特定临床领域,而非最大化通用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef74/9952043/ed8ec016a253/ooad014f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef74/9952043/aa60f3c89dab/ooad014f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef74/9952043/ed8ec016a253/ooad014f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef74/9952043/aa60f3c89dab/ooad014f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef74/9952043/ed8ec016a253/ooad014f2.jpg

相似文献

1
Extraction of clinical phenotypes for Alzheimer's disease dementia from clinical notes using natural language processing.使用自然语言处理技术从临床记录中提取阿尔茨海默病痴呆的临床表型。
JAMIA Open. 2023 Feb 24;6(1):ooad014. doi: 10.1093/jamiaopen/ooad014. eCollection 2023 Apr.
2
Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.使用自然语言处理从阿尔茨海默病患者的临床记录中提取睡眠信息。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2217-2227. doi: 10.1093/jamia/ocae177.
3
Assess the documentation of cognitive tests and biomarkers in electronic health records via natural language processing for Alzheimer's disease and related dementias.通过自然语言处理评估电子健康记录中的认知测试和生物标志物文档,用于阿尔茨海默病及相关痴呆症。
Int J Med Inform. 2023 Feb;170:104973. doi: 10.1016/j.ijmedinf.2022.104973. Epub 2022 Dec 21.
4
Extracting Critical Information from Unstructured Clinicians' Notes Data to Identify Dementia Severity Using a Rule-Based Approach: Feasibility Study.基于规则的方法从非结构化临床医生笔记数据中提取关键信息以识别痴呆严重程度的可行性研究。
JMIR Aging. 2024 Sep 24;7:e57926. doi: 10.2196/57926.
5
Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review.慢性病临床记录的自然语言处理:系统综述
JMIR Med Inform. 2019 Apr 27;7(2):e12239. doi: 10.2196/12239.
6
Designing an openEHR-Based Pipeline for Extracting and Standardizing Unstructured Clinical Data Using Natural Language Processing.设计一个基于 openEHR 的管道,使用自然语言处理提取和标准化非结构化临床数据。
Methods Inf Med. 2020 Dec;59(S 02):e64-e78. doi: 10.1055/s-0040-1716403. Epub 2020 Oct 14.
7
Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data.使用自然语言处理方法从自由文本和非结构化患者生成的健康数据中提取医学信息:基于真实世界数据的可行性研究
JMIR Form Res. 2023 Mar 7;7:e43014. doi: 10.2196/43014.
8
Development of a Natural Language Processing System for Extracting Rheumatoid Arthritis Outcomes From Clinical Notes Using the National Rheumatology Informatics System for Effectiveness Registry.利用国家风湿病疗效登记信息系统开发一个用于从临床记录中提取类风湿关节炎治疗结果的自然语言处理系统。
Arthritis Care Res (Hoboken). 2023 Mar;75(3):608-615. doi: 10.1002/acr.24869. Epub 2022 Oct 31.
9
Natural language processing systems for extracting information from electronic health records about activities of daily living. A systematic review.用于从电子健康记录中提取日常生活活动信息的自然语言处理系统。一项系统综述。
JAMIA Open. 2024 May 24;7(2):ooae044. doi: 10.1093/jamiaopen/ooae044. eCollection 2024 Jul.
10
A natural language processing pipeline to synthesize patient-generated notes toward improving remote care and chronic disease management: a cystic fibrosis case study.一种用于合成患者生成的笔记以改善远程护理和慢性病管理的自然语言处理管道:囊性纤维化案例研究。
JAMIA Open. 2021 Sep 29;4(3):ooab084. doi: 10.1093/jamiaopen/ooab084. eCollection 2021 Jul.

引用本文的文献

1
Automated extraction of functional biomarkers of verbal and ambulatory ability from multi-institutional clinical notes using large language models.使用大语言模型从多机构临床记录中自动提取言语和行动能力的功能生物标志物。
J Neurodev Disord. 2025 Apr 30;17(1):24. doi: 10.1186/s11689-025-09612-w.
2
AI approaches for phenotyping Alzheimer's disease and related dementias using electronic health records.使用电子健康记录对阿尔茨海默病及相关痴呆症进行表型分析的人工智能方法。
Alzheimers Dement (N Y). 2025 Apr 24;11(2):e70089. doi: 10.1002/trc2.70089. eCollection 2025 Apr-Jun.
3
Natural language processing of electronic health records for early detection of cognitive decline: a systematic review.

本文引用的文献

1
Development and Evaluation of a Natural Language Processing Annotation Tool to Facilitate Phenotyping of Cognitive Status in Electronic Health Records: Diagnostic Study.开发和评估一种自然语言处理标注工具以促进电子健康记录中认知状态的表型分析:诊断研究。
J Med Internet Res. 2022 Aug 30;24(8):e40384. doi: 10.2196/40384.
2
Evaluation of a Concept Mapping Task Using Named Entity Recognition and Normalization in Unstructured Clinical Text.在非结构化临床文本中使用命名实体识别和标准化对概念映射任务进行评估。
J Healthc Inform Res. 2020 Oct 16;4(4):395-410. doi: 10.1007/s41666-020-00079-z. eCollection 2020 Dec.
3
Modifiable, Non-Modifiable, and Clinical Factors Associated with Progression of Alzheimer's Disease.
用于早期检测认知衰退的电子健康记录自然语言处理:一项系统综述
NPJ Digit Med. 2025 Mar 1;8(1):133. doi: 10.1038/s41746-025-01527-z.
4
Artificial Intelligence in Psychiatry: A Review of Biological and Behavioral Data Analyses.精神病学中的人工智能:生物和行为数据分析综述
Diagnostics (Basel). 2025 Feb 11;15(4):434. doi: 10.3390/diagnostics15040434.
5
Real-World Insights Into Dementia Diagnosis Trajectory and Clinical Practice Patterns Unveiled by Natural Language Processing: Development and Usability Study.自然语言处理揭示的痴呆症诊断轨迹和临床实践模式的真实世界见解:开发与可用性研究
JMIR Aging. 2025 Feb 25;8:e65221. doi: 10.2196/65221.
6
CD-Tron: Leveraging Large Clinical Language Model for Early Detection of Cognitive Decline from Electronic Health Records.CD-Tron:利用大型临床语言模型从电子健康记录中早期检测认知衰退
medRxiv. 2025 May 7:2024.10.31.24316386. doi: 10.1101/2024.10.31.24316386.
7
Extracting Critical Information from Unstructured Clinicians' Notes Data to Identify Dementia Severity Using a Rule-Based Approach: Feasibility Study.基于规则的方法从非结构化临床医生笔记数据中提取关键信息以识别痴呆严重程度的可行性研究。
JMIR Aging. 2024 Sep 24;7:e57926. doi: 10.2196/57926.
8
Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: a performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy's rule-based and machine learning-based methods.利用GPT-4在电子健康记录中识别癌症表型:GPT-4、GPT-3.5-turbo、Flan-T5、Llama-3-8B与spaCy基于规则和基于机器学习的方法之间的性能比较。
JAMIA Open. 2024 Jul 3;7(3):ooae060. doi: 10.1093/jamiaopen/ooae060. eCollection 2024 Oct.
9
Using Natural Language Processing to Identify Home Health Care Patients at Risk for Diagnosis of Alzheimer's Disease and Related Dementias.利用自然语言处理识别有阿尔茨海默病和相关痴呆症诊断风险的家庭保健患者。
J Appl Gerontol. 2024 Oct;43(10):1461-1472. doi: 10.1177/07334648241242321. Epub 2024 Mar 31.
10
Association Between Socioeconomic Factors, Race, and Use of a Specialty Memory Clinic.社会经济因素、种族与使用专业记忆门诊之间的关联。
Neurology. 2023 Oct 3;101(14):e1424-e1433. doi: 10.1212/WNL.0000000000207674. Epub 2023 Aug 2.
可改变、不可改变和临床因素与阿尔茨海默病的进展相关。
J Alzheimers Dis. 2021;80(1):1-27. doi: 10.3233/JAD-201182.
4
Biomarkers for Alzheimer's disease-preparing for a new era of disease-modifying therapies.阿尔茨海默病的生物标志物——为疾病修饰疗法的新时代做准备。
Mol Psychiatry. 2021 Jan;26(1):296-308. doi: 10.1038/s41380-020-0721-9. Epub 2020 Apr 6.
5
Stratifying risk for dementia onset using large-scale electronic health record data: A retrospective cohort study.利用大规模电子健康记录数据对痴呆发病风险进行分层:一项回顾性队列研究。
Alzheimers Dement. 2020 Mar;16(3):531-540. doi: 10.1016/j.jalz.2019.09.084. Epub 2020 Jan 16.
6
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
7
A Novel Ensemble-Based Machine Learning Algorithm to Predict the Conversion From Mild Cognitive Impairment to Alzheimer's Disease Using Socio-Demographic Characteristics, Clinical Information, and Neuropsychological Measures.一种基于集成学习的新型机器学习算法,利用社会人口学特征、临床信息和神经心理学测量来预测从轻度认知障碍向阿尔茨海默病的转变。
Front Neurol. 2019 Jul 16;10:756. doi: 10.3389/fneur.2019.00756. eCollection 2019.
8
High-precision plasma β-amyloid 42/40 predicts current and future brain amyloidosis.高精度血浆β-淀粉样蛋白 42/40 可预测当前和未来的脑淀粉样变。
Neurology. 2019 Oct 22;93(17):e1647-e1659. doi: 10.1212/WNL.0000000000008081. Epub 2019 Aug 1.
9
Identifying incident dementia by applying machine learning to a very large administrative claims dataset.运用机器学习技术从海量行政理赔数据中识别偶发痴呆症。
PLoS One. 2019 Jul 5;14(7):e0203246. doi: 10.1371/journal.pone.0203246. eCollection 2019.
10
Alzheimer's disease: risk factors and potentially protective measures.阿尔茨海默病:危险因素和潜在的保护措施。
J Biomed Sci. 2019 May 9;26(1):33. doi: 10.1186/s12929-019-0524-y.