School of Computer and Information Technology, Institute of Medical Intelligence, Beijing Jiaotong University, Beijing, 100044, China.
Institute for TCM-X, MOE Key Laboratory of Bioinformatics / Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing, 10084, China.
Hum Genet. 2021 Jun;140(6):897-913. doi: 10.1007/s00439-020-02253-0. Epub 2021 Jan 7.
Disease gene identification is a critical step towards uncovering the molecular mechanisms of diseases and systematically investigating complex disease phenotypes. Despite considerable efforts to develop powerful computing methods, candidate gene identification remains a severe challenge owing to the connectivity of an incomplete interactome network, which hampers the discovery of true novel candidate genes. We developed a network-based machine-learning framework to identify both functional modules and disease candidate genes. In this framework, we designed a semi-supervised non-negative matrix factorization model to obtain the functional modules related to the diseases and genes. Of note, we proposed a disease gene-prioritizing method called MapGene that integrates the correlations from both functional modules and network closeness. Our framework identified a set of functional modules with highly functional homogeneity and close gene interactions. Experiments on a large-scale benchmark dataset showed that MapGene performs significantly better than the state-of-the-art algorithms. Further analysis demonstrates MapGene can effectively relieve the impact of the incompleteness of interactome networks and obtain highly reliable rankings of candidate genes. In addition, disease cases on Parkinson's disease and diabetes mellitus confirmed the generalization of MapGene for novel candidate gene identification. This work proposed, for the first time, an integrated computing framework to predict both functional modules and disease candidate genes. The methodology and results support that our framework has the potential to help discover underlying functional modules and reliable candidate genes in human disease.
疾病基因识别是揭示疾病分子机制和系统研究复杂疾病表型的关键步骤。尽管人们付出了相当大的努力来开发强大的计算方法,但由于不完全的互作网络的连通性,候选基因识别仍然是一个严峻的挑战,这阻碍了真正新颖的候选基因的发现。我们开发了一种基于网络的机器学习框架,用于识别功能模块和疾病候选基因。在这个框架中,我们设计了一个半监督非负矩阵分解模型来获得与疾病和基因相关的功能模块。值得注意的是,我们提出了一种称为 MapGene 的疾病基因优先排序方法,该方法整合了功能模块和网络接近度的相关性。我们的框架确定了一组具有高度功能同质性和紧密基因相互作用的功能模块。在一个大规模基准数据集上的实验表明,MapGene 的性能明显优于最先进的算法。进一步的分析表明,MapGene 可以有效地缓解互作网络不完整的影响,并获得高度可靠的候选基因排名。此外,帕金森病和糖尿病病例证实了 MapGene 对新候选基因识别的泛化能力。这项工作首次提出了一种集成的计算框架,用于预测功能模块和疾病候选基因。该方法和结果支持我们的框架有潜力帮助发现人类疾病中的潜在功能模块和可靠的候选基因。