Department of Computer Science, Tsinghua University, Beijing, China; MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic and Systems Biology, TNLIST, China.
Department of Automation, Tsinghua University, Beijing, China; MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic and Systems Biology, TNLIST, China.
Methods. 2018 Aug 1;145:41-50. doi: 10.1016/j.ymeth.2018.06.002. Epub 2018 Jun 3.
Genome-wide association studies (GWAS) have successfully discovered a number of disease-associated genetic variants in the past decade, providing an unprecedented opportunity for deciphering genetic basis of human inherited diseases. However, it is still a challenging task to extract biological knowledge from the GWAS data, due to such issues as missing heritability and weak interpretability. Indeed, the fact that the majority of discovered loci fall into noncoding regions without clear links to genes has been preventing the characterization of their functions and appealing for a sophisticated approach to bridge genetic and genomic studies. Towards this problem, network-based prioritization of candidate genes, which performs integrated analysis of gene networks with GWAS data, has emerged as a promising direction and attracted much attention. However, most existing methods overlook the sparse and noisy properties of gene networks and thus may lead to suboptimal performance. Motivated by this understanding, we proposed a novel method called REGENT for integrating multiple gene networks with GWAS data to prioritize candidate genes for complex diseases. We leveraged a technique called the network representation learning to embed a gene network into a compact and robust feature space, and then designed a hierarchical statistical model to integrate features of multiple gene networks with GWAS data for the effective inference of genes associated with a disease of interest. We applied our method to six complex diseases and demonstrated the superior performance of REGENT over existing approaches in recovering known disease-associated genes. We further conducted a pathway analysis and showed that the ability of REGENT to discover disease-associated pathways. We expect to see applications of our method to a broad spectrum of diseases for post-GWAS analysis. REGENT is freely available at https://github.com/wmmthu/REGENT.
全基因组关联研究(GWAS)在过去十年中成功发现了许多与疾病相关的遗传变异,为解析人类遗传性疾病的遗传基础提供了前所未有的机会。然而,由于遗传率缺失和可解释性弱等问题,从 GWAS 数据中提取生物学知识仍然是一项具有挑战性的任务。事实上,大多数已发现的基因座都位于没有明确与基因相关联的非编码区域,这阻碍了对其功能的描述,并需要一种复杂的方法来连接遗传和基因组研究。针对这个问题,基于网络的候选基因优先级排序方法,通过对基因网络和 GWAS 数据进行综合分析,已经成为一个很有前途的方向,并引起了广泛关注。然而,大多数现有的方法都忽略了基因网络的稀疏和嘈杂特性,因此可能导致次优的性能。受此启发,我们提出了一种新的方法,称为 REGENT,用于将多个基因网络与 GWAS 数据集成,以优先考虑复杂疾病的候选基因。我们利用一种称为网络表示学习的技术,将基因网络嵌入到一个紧凑而稳健的特征空间中,然后设计了一个层次统计模型,将多个基因网络的特征与 GWAS 数据集成,以便有效地推断与感兴趣疾病相关的基因。我们将我们的方法应用于六种复杂疾病,并证明了 REGENT 在恢复已知疾病相关基因方面优于现有方法的性能。我们进一步进行了途径分析,并显示了 REGENT 发现疾病相关途径的能力。我们希望看到我们的方法在广泛的疾病后 GWAS 分析中得到应用。REGENT 可在 https://github.com/wmmthu/REGENT 上免费获取。