利用电子健康记录对轻度认知障碍、阿尔茨海默病及相关痴呆症进行自动表型分析。

Automated phenotyping of mild cognitive impairment and Alzheimer's disease and related dementias using electronic health records.

作者信息

Wei Ruoqi, Buss Stephanie S, Milde Rebecca, Fernandes Marta, Sumsion Daniel, Davis Elijah, Kong Wan-Yee, Xiong Yiwen, Veltink Jet, Rao Samvrit, Westover Tara M, Petersen Lydia, Turley Niels, Singh Arjun, Das Sudeshna, Junior Valdery Moura, Ghanta Manohar, Gupta Aditya, Kim Jennifer, Lam Alice D, Stone Katie L, Mignot Emmanuel, Hwang Dennis, Trotti Lynn Marie, Clifford Gari D, Katwa Umakanth, Thomas Robert J, Mukerji Shibani, Zafar Sahar F, Westover M Brandon, Sun Haoqi

机构信息

Beth Israel Deaconess Medical Center, Boston, MA, USA; Massachusetts General Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, FL, USA.

Beth Israel Deaconess Medical Center, Boston, MA, USA; Harvard Medical School, Boston, MA, USA.

出版信息

Int J Med Inform. 2025 Aug;200:105917. doi: 10.1016/j.ijmedinf.2025.105917. Epub 2025 Apr 11.

Abstract

OBJECTIVES

Unstructured and structured data in electronic health records (EHR) are a rich source of information for research and quality improvement studies. However, extracting accurate information from EHR is labor-intensive. Timely and accurate identification of patients with Alzheimer's Disease, related dementias (ADRD), or mild cognitive impairment (MCI) is critical for improving patient outcomes through early intervention, optimizing care plans, and reducing healthcare system burdens. Here we introduce an automated EHR phenotyping model to streamline this process and enable efficient identification of these conditions.

METHODS

We analyzed data from 3,626 outpatients seen at two hospitals between February 2015 and June 2022. Through manual chart review, we established ground truth labels for the presence or absence of MCI/ADRD diagnoses. Our model combined three types of data: (1) unstructured clinical notes, from which we extracted single words, two-word phrases (bigrams), and three-word phrases (trigrams) as features, weighted using Term Frequency-Inverse Document Frequency (TF-IDF) to capture their relative importance, (2) International Classification of Diseases (ICD) codes, and (3) medication prescriptions related to MCI/ADRD. We trained a regularized logistic regression model to predict MCI/ADRD diagnoses and evaluated its performance using standard metrics including area under the receiver operating curve (AUROC), area under the precision-recall curve (AUPRC), accuracy, specificity, precision, recall, and F1 score.

RESULTS

Thirty percent of patients in the cohort carried diagnoses of MCI/ADRD based on manual review. When evaluated on a held-out test set, the best model using clinical notes, ICDs, and medications, achieved an AUROC of 0.98, an AUPRC of 0.98, an accuracy of 0.93, a sensitivity (recall) of 0.91, a specificity of 0.96, a precision of 0.96, and an F1 score of 0.93 The estimated overall accuracy for patients randomly selected from EHRs was 99.88%.

CONCLUSION

Automated EHR phenotyping accurately identifies patients with MCI/ADRD based on clinical notes, ICD codes, and medication records. This approach holds potential for large-scale MCI/ADRD research utilizing EHR databases.

摘要

目的

电子健康记录(EHR)中的非结构化和结构化数据是研究和质量改进研究的丰富信息来源。然而,从EHR中提取准确信息需要耗费大量人力。及时、准确地识别患有阿尔茨海默病、相关痴呆症(ADRD)或轻度认知障碍(MCI)的患者,对于通过早期干预改善患者预后、优化护理计划以及减轻医疗系统负担至关重要。在此,我们引入一种自动化的EHR表型分析模型,以简化这一过程,并实现对这些病症的高效识别。

方法

我们分析了2015年2月至2022年6月期间在两家医院就诊的3626名门诊患者的数据。通过人工病历审查,我们确定了MCI/ADRD诊断是否存在的真实标签。我们的模型结合了三种类型的数据:(1)非结构化临床记录,我们从中提取单个单词、双词短语(二元组)和三词短语(三元组)作为特征,并使用词频-逆文档频率(TF-IDF)加权以捕捉它们的相对重要性;(2)国际疾病分类(ICD)编码;(3)与MCI/ADRD相关的药物处方。我们训练了一个正则化逻辑回归模型来预测MCI/ADRD诊断,并使用包括受试者工作特征曲线下面积(AUROC)、精确召回率曲线下面积(AUPRC)、准确率、特异性、精确率、召回率和F1分数等标准指标评估其性能。

结果

根据人工审查,队列中30%的患者被诊断为MCI/ADRD。在一个留出的测试集上进行评估时,使用临床记录、ICD编码和药物的最佳模型的AUROC为0.98,AUPRC为0.98,准确率为0.93,灵敏度(召回率)为0.91,特异性为0.96,精确率为0.96,F1分数为0.93。从EHR中随机选择的患者的估计总体准确率为99.88%。

结论

自动化的EHR表型分析基于临床记录、ICD编码和药物记录准确识别患有MCI/ADRD的患者。这种方法在利用EHR数据库进行大规模MCI/ADRD研究方面具有潜力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索