Suppr超能文献

利用自然语言处理技术在电子健康记录中识别狼疮性肾炎表型。

Natural language processing to identify lupus nephritis phenotype in electronic health records.

作者信息

Deng Yu, Pacheco Jennifer A, Ghosh Anika, Chung Anh, Mao Chengsheng, Smith Joshua C, Zhao Juan, Wei Wei-Qi, Barnado April, Dorn Chad, Weng Chunhua, Liu Cong, Cordon Adam, Yu Jingzhi, Tedla Yacob, Kho Abel, Ramsey-Goldman Rosalind, Walunas Theresa, Luo Yuan

机构信息

Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.

Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, USA.

出版信息

BMC Med Inform Decis Mak. 2024 Mar 3;22(Suppl 2):348. doi: 10.1186/s12911-024-02420-7.

Abstract

BACKGROUND

Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data from the Northwestern Medicine Enterprise Data Warehouse (NMEDW).

METHODS

We developed five algorithms: a rule-based algorithm using only structured data (baseline algorithm) and four algorithms using different NLP models. The first NLP model applied simple regular expression for keywords search combined with structured data. The other three NLP models were based on regularized logistic regression and used different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components (i.e. a curated list of CUIs, regular expression concepts, structured data) respectively. The baseline algorithm and the best performing NLP algorithm were externally validated on a dataset from Vanderbilt University Medical Center (VUMC).

RESULTS

Our best performing NLP model incorporated features from both structured data, regular expression concepts, and mapped concept unique identifiers (CUIs) and showed improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.52 vs 0.93) datasets compared to the baseline lupus nephritis algorithm.

CONCLUSION

Our NLP MetaMap mixed model improved the F-measure greatly compared to the structured data only algorithm in both internal and external validation datasets. The NLP algorithms can serve as powerful tools to accurately identify lupus nephritis phenotype in EHR for clinical research and better targeted therapies.

摘要

背景

系统性红斑狼疮(SLE)是一种罕见的自身免疫性疾病,其特征为病情发作和缓解过程不可预测,且表现多样。狼疮性肾炎是SLE导致器官损害和死亡的主要疾病表现之一,是狼疮分类标准的关键组成部分。因此,在电子健康记录(EHR)中准确识别狼疮性肾炎将有利于大型队列观察性研究和临床试验,在这些研究中,患者群体的特征对于招募、研究设计和分析至关重要。狼疮性肾炎可通过程序代码和结构化数据(如实验室检查)来识别。然而,记录狼疮性肾炎的其他关键信息,如肾活检的组织学报告和既往病史叙述,需要复杂的文本处理来从病理报告和临床记录中挖掘信息。在本研究中,我们使用西北大学医学企业数据仓库(NMEDW)的EHR数据开发了算法,以识别有无自然语言处理(NLP)情况下的狼疮性肾炎。

方法

我们开发了五种算法:一种仅使用结构化数据的基于规则的算法(基线算法)和四种使用不同NLP模型的算法。第一个NLP模型应用简单正则表达式进行关键词搜索并结合结构化数据。其他三个NLP模型基于正则化逻辑回归,并分别使用不同的特征集,包括概念唯一标识符(CUI)的正向提及、CUI的出现次数,以及三个组件的混合(即CUI的精选列表、正则表达式概念、结构化数据)。基线算法和表现最佳的NLP算法在范德堡大学医学中心(VUMC)的数据集上进行了外部验证。

结果

我们表现最佳的NLP模型结合了结构化数据、正则表达式概念和映射的概念唯一标识符(CUI)的特征,与基线狼疮性肾炎算法相比,在NMEDW(0.41对0.79)和VUMC(0.52对0.93)数据集中均显示出F值的提高。

结论

在内部和外部验证数据集中,我们的NLP MetaMap混合模型与仅使用结构化数据的算法相比,极大地提高了F值。NLP算法可作为强大的工具,在EHR中准确识别狼疮性肾炎表型,以用于临床研究和更具针对性的治疗。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bfec/10910523/d7877b0838f4/12911_2024_2420_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验