利用自然语言处理技术在电子健康记录中识别狼疮性肾炎表型。

Natural language processing to identify lupus nephritis phenotype in electronic health records.

作者信息

Deng Yu, Pacheco Jennifer A, Ghosh Anika, Chung Anh, Mao Chengsheng, Smith Joshua C, Zhao Juan, Wei Wei-Qi, Barnado April, Dorn Chad, Weng Chunhua, Liu Cong, Cordon Adam, Yu Jingzhi, Tedla Yacob, Kho Abel, Ramsey-Goldman Rosalind, Walunas Theresa, Luo Yuan

机构信息

Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.

Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, USA.

出版信息

BMC Med Inform Decis Mak. 2024 Mar 3;22(Suppl 2):348. doi: 10.1186/s12911-024-02420-7.

DOI:10.1186/s12911-024-02420-7

PMID:38433189

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10910523/

Abstract

BACKGROUND

Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data from the Northwestern Medicine Enterprise Data Warehouse (NMEDW).

METHODS

We developed five algorithms: a rule-based algorithm using only structured data (baseline algorithm) and four algorithms using different NLP models. The first NLP model applied simple regular expression for keywords search combined with structured data. The other three NLP models were based on regularized logistic regression and used different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components (i.e. a curated list of CUIs, regular expression concepts, structured data) respectively. The baseline algorithm and the best performing NLP algorithm were externally validated on a dataset from Vanderbilt University Medical Center (VUMC).

RESULTS

Our best performing NLP model incorporated features from both structured data, regular expression concepts, and mapped concept unique identifiers (CUIs) and showed improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.52 vs 0.93) datasets compared to the baseline lupus nephritis algorithm.

CONCLUSION

Our NLP MetaMap mixed model improved the F-measure greatly compared to the structured data only algorithm in both internal and external validation datasets. The NLP algorithms can serve as powerful tools to accurately identify lupus nephritis phenotype in EHR for clinical research and better targeted therapies.

摘要

背景

系统性红斑狼疮（SLE）是一种罕见的自身免疫性疾病，其特征为病情发作和缓解过程不可预测，且表现多样。狼疮性肾炎是SLE导致器官损害和死亡的主要疾病表现之一，是狼疮分类标准的关键组成部分。因此，在电子健康记录（EHR）中准确识别狼疮性肾炎将有利于大型队列观察性研究和临床试验，在这些研究中，患者群体的特征对于招募、研究设计和分析至关重要。狼疮性肾炎可通过程序代码和结构化数据（如实验室检查）来识别。然而，记录狼疮性肾炎的其他关键信息，如肾活检的组织学报告和既往病史叙述，需要复杂的文本处理来从病理报告和临床记录中挖掘信息。在本研究中，我们使用西北大学医学企业数据仓库（NMEDW）的EHR数据开发了算法，以识别有无自然语言处理（NLP）情况下的狼疮性肾炎。

方法

我们开发了五种算法：一种仅使用结构化数据的基于规则的算法（基线算法）和四种使用不同NLP模型的算法。第一个NLP模型应用简单正则表达式进行关键词搜索并结合结构化数据。其他三个NLP模型基于正则化逻辑回归，并分别使用不同的特征集，包括概念唯一标识符（CUI）的正向提及、CUI的出现次数，以及三个组件的混合（即CUI的精选列表、正则表达式概念、结构化数据）。基线算法和表现最佳的NLP算法在范德堡大学医学中心（VUMC）的数据集上进行了外部验证。

结果

我们表现最佳的NLP模型结合了结构化数据、正则表达式概念和映射的概念唯一标识符（CUI）的特征，与基线狼疮性肾炎算法相比，在NMEDW（0.41对0.79）和VUMC（0.52对0.93）数据集中均显示出F值的提高。

结论

在内部和外部验证数据集中，我们的NLP MetaMap混合模型与仅使用结构化数据的算法相比，极大地提高了F值。NLP算法可作为强大的工具，在EHR中准确识别狼疮性肾炎表型，以用于临床研究和更具针对性的治疗。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfec/10910523/d7877b0838f4/12911_2024_2420_Fig1_HTML.jpg

相似文献

Natural language processing to identify lupus nephritis phenotype in electronic health records.利用自然语言处理技术在电子健康记录中识别狼疮性肾炎表型。

BMC Med Inform Decis Mak. 2024 Mar 3;22(Suppl 2):348. doi: 10.1186/s12911-024-02420-7.

Word2Vec inversion and traditional text classifiers for phenotyping lupus.用于狼疮表型分析的词向量反演和传统文本分类器

BMC Med Inform Decis Mak. 2017 Aug 22;17(1):126. doi: 10.1186/s12911-017-0518-1.

Identifying lupus patients in electronic health records: Development and validation of machine learning algorithms and application of rule-based algorithms.在电子健康记录中识别狼疮患者：机器学习算法的开发和验证以及基于规则算法的应用。

Semin Arthritis Rheum. 2019 Aug;49(1):84-90. doi: 10.1016/j.semarthrit.2019.01.002. Epub 2019 Jan 4.

Using a Multi-Institutional Pediatric Learning Health System to Identify Systemic Lupus Erythematosus and Lupus Nephritis: Development and Validation of Computable Phenotypes.利用多机构儿科学习健康系统识别系统性红斑狼疮和狼疮性肾炎：计算表型的开发和验证。

Clin J Am Soc Nephrol. 2022 Jan;17(1):65-74. doi: 10.2215/CJN.07810621. Epub 2021 Nov 3.

Leveraging Transformers-based models and linked data for deep phenotyping in radiology.利用基于Transformer的模型和关联数据进行放射学深度表型分析。

Comput Methods Programs Biomed. 2025 Mar;260:108567. doi: 10.1016/j.cmpb.2024.108567. Epub 2025 Jan 3.

Using natural language processing to identify opioid use disorder in electronic health record data.利用自然语言处理技术在电子健康记录数据中识别阿片类药物使用障碍。

Int J Med Inform. 2023 Feb;170:104963. doi: 10.1016/j.ijmedinf.2022.104963. Epub 2022 Dec 10.

Automated feature selection of predictors in electronic medical records data.电子病历数据中预测指标的自动特征选择

Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.

Augmented intelligence with natural language processing applied to electronic health records for identifying patients with non-alcoholic fatty liver disease at risk for disease progression.应用自然语言处理的增强型人工智能用于电子健康记录，以识别非酒精性脂肪性肝病患者中疾病进展风险较高的患者。

Int J Med Inform. 2019 Sep;129:334-341. doi: 10.1016/j.ijmedinf.2019.06.028. Epub 2019 Jul 6.

Development of a natural language processing algorithm to detect chronic cough in electronic health records.开发一种自然语言处理算法以检测电子健康记录中的慢性咳嗽。

BMC Pulm Med. 2022 Jun 28;22(1):256. doi: 10.1186/s12890-022-02035-6.

Ensembles of natural language processing systems for portable phenotyping solutions.用于便携表型解决方案的自然语言处理系统集合。

J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.

引用本文的文献

Artificial intelligence and natural language processing for improved telemedicine: Before, during and after remote consultation.用于改善远程医疗的人工智能与自然语言处理：远程会诊前、会诊期间及会诊后

Aten Primaria. 2025 Feb 15;57(8):103228. doi: 10.1016/j.aprim.2025.103228.

Advancing rheumatology with natural language processing: insights and prospects from a systematic review.利用自然语言处理推动风湿病学发展：系统评价的见解与展望

Rheumatol Adv Pract. 2024 Sep 19;8(4):rkae120. doi: 10.1093/rap/rkae120. eCollection 2024.

Comparison of State-of-the-Art Neural Network Survival Models with the Pooled Cohort Equations for Cardiovascular Disease Risk Prediction.比较最先进的神经网络生存模型与用于心血管疾病风险预测的合并队列方程。

BMC Med Res Methodol. 2023 Jan 24;23(1):22. doi: 10.1186/s12874-022-01829-w.

本文引用的文献

Use of real-world evidence data to evaluate the comparative effectiveness of second-line type 2 diabetes medications on chronic kidney disease.利用真实世界证据数据评估二线2型糖尿病药物对慢性肾脏病的相对疗效。

J Clin Transl Endocrinol. 2022 Oct 10;30:100309. doi: 10.1016/j.jcte.2022.100309. eCollection 2022 Dec.

Evaluating the state of the art in missing data imputation for clinical data.评估临床数据缺失值插补的最新技术状态。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab489.

Identifying Breast Cancer Distant Recurrences from Electronic Health Records Using Machine Learning.使用机器学习从电子健康记录中识别乳腺癌远处复发

J Healthc Inform Res. 2019;3(3):283-299. doi: 10.1007/s41666-019-00046-3. Epub 2019 Apr 8.

Development and validation of lupus nephritis case definitions using United States veterans affairs electronic health records.利用美国退伍军人事务部电子健康记录开发和验证狼疮肾炎病例定义。

Lupus. 2021 Mar;30(3):518-526. doi: 10.1177/0961203320973267. Epub 2020 Nov 11.

2019 European League Against Rheumatism/American College of Rheumatology classification criteria for systemic lupus erythematosus.2019 年欧洲抗风湿病联盟/美国风湿病学会系统性红斑狼疮分类标准。

Ann Rheum Dis. 2019 Sep;78(9):1151-1159. doi: 10.1136/annrheumdis-2018-214819. Epub 2019 Aug 5.

Novel paradigms in systemic lupus erythematosus.系统性红斑狼疮的新范式。

Lancet. 2019 Jun 8;393(10188):2344-2358. doi: 10.1016/S0140-6736(19)30546-X. Epub 2019 Jun 6.

New therapies for systemic lupus erythematosus - past imperfect, future tense.治疗系统性红斑狼疮的新疗法——过去不尽如人意，未来仍充满希望。

Nat Rev Rheumatol. 2019 Jul;15(7):403-412. doi: 10.1038/s41584-019-0235-5.

Using natural language processing and machine learning to identify breast cancer local recurrence.利用自然语言处理和机器学习识别乳腺癌局部复发。

BMC Bioinformatics. 2018 Dec 28;19(Suppl 17):498. doi: 10.1186/s12859-018-2466-x.

Natural Language Processing for EHR-Based Computational Phenotyping.基于电子健康记录的自然语言处理计算表型。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):139-153. doi: 10.1109/TCBB.2018.2849968. Epub 2018 Jun 25.

Contralateral Breast Cancer Event Detection Using Nature Language Processing.使用自然语言处理技术进行对侧乳腺癌事件检测

AMIA Annu Symp Proc. 2018 Apr 16;2017:1885-1892. eCollection 2017.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用自然语言处理技术在电子健康记录中识别狼疮性肾炎表型。

Natural language processing to identify lupus nephritis phenotype in electronic health records.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献