Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, PA 19122 USA.
Department of Biology, Temple University, Philadelphia, PA 19122 USA.
Sci Rep. 2016 Aug 31;6:32404. doi: 10.1038/srep32404.
Data-driven phenotype analyses on Electronic Health Record (EHR) data have recently drawn benefits across many areas of clinical practice, uncovering new links in the medical sciences that can potentially affect the well-being of millions of patients. In this paper, EHR data is used to discover novel relationships between diseases by studying their comorbidities (co-occurrences in patients). A novel embedding model is designed to extract knowledge from disease comorbidities by learning from a large-scale EHR database comprising more than 35 million inpatient cases spanning nearly a decade, revealing significant improvements on disease phenotyping over current computational approaches. In addition, the use of the proposed methodology is extended to discover novel disease-gene associations by including valuable domain knowledge from genome-wide association studies. To evaluate our approach, its effectiveness is compared against a held-out set where, again, it revealed very compelling results. For selected diseases, we further identify candidate gene lists for which disease-gene associations were not studied previously. Thus, our approach provides biomedical researchers with new tools to filter genes of interest, thus, reducing costly lab studies.
基于电子健康记录 (EHR) 数据的数据分析在医学实践的许多领域中最近得到了广泛应用,揭示了医学科学中的新联系,这些联系可能会影响数百万患者的健康。在本文中,我们通过研究疾病的合并症(患者中的共同出现),利用 EHR 数据发现疾病之间的新关系。设计了一种新的嵌入模型,通过从一个包含超过 3500 万住院病例的大型 EHR 数据库中学习,从疾病合并症中提取知识,与当前的计算方法相比,在疾病表型分析方面取得了显著的改进。此外,通过纳入全基因组关联研究的有价值的领域知识,将所提出的方法扩展到发现新的疾病-基因关联。为了评估我们的方法,将其有效性与一个保留集进行了比较,结果同样非常引人注目。对于选定的疾病,我们进一步确定了候选基因列表,这些基因之前没有研究过与疾病的关联。因此,我们的方法为生物医学研究人员提供了新的工具来筛选感兴趣的基因,从而减少昂贵的实验室研究。