School of Computer Science and Engineering Nanyang Technological University (NTU).
Institute for Infocomm Research (I2R), A*STAR, Singapore.
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa303.
Disease-gene association through genome-wide association study (GWAS) is an arduous task for researchers. Investigating single nucleotide polymorphisms that correlate with specific diseases needs statistical analysis of associations. Considering the huge number of possible mutations, in addition to its high cost, another important drawback of GWAS analysis is the large number of false positives. Thus, researchers search for more evidence to cross-check their results through different sources. To provide the researchers with alternative and complementary low-cost disease-gene association evidence, computational approaches come into play. Since molecular networks are able to capture complex interplay among molecules in diseases, they become one of the most extensively used data for disease-gene association prediction. In this survey, we aim to provide a comprehensive and up-to-date review of network-based methods for disease gene prediction. We also conduct an empirical analysis on 14 state-of-the-art methods. To summarize, we first elucidate the task definition for disease gene prediction. Secondly, we categorize existing network-based efforts into network diffusion methods, traditional machine learning methods with handcrafted graph features and graph representation learning methods. Thirdly, an empirical analysis is conducted to evaluate the performance of the selected methods across seven diseases. We also provide distinguishing findings about the discussed methods based on our empirical analysis. Finally, we highlight potential research directions for future studies on disease gene prediction.
通过全基因组关联研究(GWAS)进行疾病-基因关联是研究人员的一项艰巨任务。研究与特定疾病相关的单核苷酸多态性需要对关联进行统计分析。考虑到可能发生的突变数量巨大,除了成本高之外,GWAS 分析的另一个重要缺点是假阳性数量多。因此,研究人员通过不同的来源寻找更多的证据来交叉检查他们的结果。为了为研究人员提供替代和补充的低成本疾病-基因关联证据,计算方法应运而生。由于分子网络能够捕捉疾病中分子之间的复杂相互作用,因此它们成为疾病基因关联预测最广泛使用的数据之一。在本次调查中,我们旨在提供一个全面和最新的基于网络的疾病基因预测方法综述。我们还对 14 种最先进的方法进行了实证分析。总之,我们首先阐明了疾病基因预测的任务定义。其次,我们将现有的基于网络的方法分为网络扩散方法、具有手工制作图特征的传统机器学习方法和图表示学习方法。第三,我们进行了实证分析,以评估所选方法在七种疾病中的性能。我们还根据我们的实证分析提供了关于讨论方法的有区别的发现。最后,我们强调了未来疾病基因预测研究的潜在研究方向。