Suppr超能文献

使用电子健康记录进行无监督机器学习以发现潜在疾病集群和患者亚组。

Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records.

作者信息

Wang Yanshan, Zhao Yiqing, Therneau Terry M, Atkinson Elizabeth J, Tafti Ahmad P, Zhang Nan, Amin Shreyasee, Limper Andrew H, Khosla Sundeep, Liu Hongfang

机构信息

Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.

Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.

出版信息

J Biomed Inform. 2020 Feb;102:103364. doi: 10.1016/j.jbi.2019.103364. Epub 2019 Dec 28.

Abstract

Machine learning has become ubiquitous and a key technology on mining electronic health records (EHRs) for facilitating clinical research and practice. Unsupervised machine learning, as opposed to supervised learning, has shown promise in identifying novel patterns and relations from EHRs without using human created labels. In this paper, we investigate the application of unsupervised machine learning models in discovering latent disease clusters and patient subgroups based on EHRs. We utilized Latent Dirichlet Allocation (LDA), a generative probabilistic model, and proposed a novel model named Poisson Dirichlet Model (PDM), which extends the LDA approach using a Poisson distribution to model patients' disease diagnoses and to alleviate age and sex factors by considering both observed and expected observations. In the empirical experiments, we evaluated LDA and PDM on three patient cohorts, namely Osteoporosis, Delirium/Dementia, and Chronic Obstructive Pulmonary Disease (COPD)/Bronchiectasis Cohorts, with their EHR data retrieved from the Rochester Epidemiology Project (REP) medical records linkage system, for the discovery of latent disease clusters and patient subgroups. We compared the effectiveness of LDA and PDM in identifying disease clusters through the visualization of disease representations. We tested the performance of LDA and PDM in differentiating patient subgroups through survival analysis, as well as statistical analysis of demographics and Elixhauser Comorbidity Index (ECI) scores in those subgroups. The experimental results show that the proposed PDM could effectively identify distinguished disease clusters based on the latent patterns hidden in the EHR data by alleviating the impact of age and sex, and that LDA could stratify patients into differentiable subgroups with larger p-values than PDM. However, those subgroups identified by LDA are highly associated with patients' age and sex. The subgroups discovered by PDM might imply the underlying patterns of diseases of greater interest in epidemiology research due to the alleviation of age and sex. Both unsupervised machine learning approaches could be leveraged to discover patient subgroups using EHRs but with different foci.

摘要

机器学习已无处不在,并且成为挖掘电子健康记录(EHR)以促进临床研究和实践的一项关键技术。与监督学习相反,无监督机器学习已显示出在不使用人工创建标签的情况下从电子健康记录中识别新模式和关系的潜力。在本文中,我们研究了无监督机器学习模型在基于电子健康记录发现潜在疾病集群和患者亚组方面的应用。我们使用了潜在狄利克雷分配(LDA),一种生成概率模型,并提出了一种名为泊松狄利克雷模型(PDM)的新模型,该模型通过使用泊松分布扩展了LDA方法,以对患者的疾病诊断进行建模,并通过考虑观察到的和预期的观察结果来减轻年龄和性别因素的影响。在实证实验中,我们在三个患者队列(即骨质疏松症、谵妄/痴呆症以及慢性阻塞性肺疾病(COPD)/支气管扩张症队列)上评估了LDA和PDM,这些队列的电子健康记录数据是从罗切斯特流行病学项目(REP)医疗记录链接系统中获取的,用于发现潜在疾病集群和患者亚组。我们通过疾病表示的可视化比较了LDA和PDM在识别疾病集群方面的有效性。我们通过生存分析以及对这些亚组中的人口统计学和埃利克斯豪泽合并症指数(ECI)评分进行统计分析,测试了LDA和PDM在区分患者亚组方面的性能。实验结果表明,所提出的PDM可以通过减轻年龄和性别的影响,基于隐藏在电子健康记录数据中的潜在模式有效地识别出不同的疾病集群,并且LDA可以将患者分层为可区分的亚组,但其p值比PDM大。然而,LDA识别出的那些亚组与患者的年龄和性别高度相关。由于年龄和性别的影响得到减轻,PDM发现的亚组可能暗示了流行病学研究中更感兴趣的疾病潜在模式。两种无监督机器学习方法都可用于利用电子健康记录发现患者亚组,但重点不同。

相似文献

8
Cardiology record multi-label classification using latent Dirichlet allocation.使用潜在狄利克雷分配进行心脏病学记录的多标签分类。
Comput Methods Programs Biomed. 2018 Oct;164:111-119. doi: 10.1016/j.cmpb.2018.07.002. Epub 2018 Jul 17.

引用本文的文献

本文引用的文献

5
Clinical information extraction applications: A literature review.临床信息提取应用:文献综述。
J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.
10
Learning probabilistic phenotypes from heterogeneous EHR data.从异构电子健康记录数据中学习概率性表型。
J Biomed Inform. 2015 Dec;58:156-165. doi: 10.1016/j.jbi.2015.10.001. Epub 2015 Oct 14.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验