Department of Mathematics, Shanghai Normal University, Shanghai, China.
BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S5. doi: 10.1186/1471-2105-14-S5-S5. Epub 2013 Apr 10.
Identification of gene-phenotype relationships is a fundamental challenge in human health clinic. Based on the observation that genes causing the same or similar phenotypes tend to correlate with each other in the protein-protein interaction network, a lot of network-based approaches were proposed based on different underlying models. A recent comparative study showed that diffusion-based methods achieve the state-of-the-art predictive performance.
In this paper, a new diffusion-based method was proposed to prioritize candidate disease genes. Diffusion profile of a disease was defined as the stationary distribution of candidate genes given a random walk with restart where similarities between phenotypes are incorporated. Then, candidate disease genes are prioritized by comparing their diffusion profiles with that of the disease. Finally, the effectiveness of our method was demonstrated through the leave-one-out cross-validation against control genes from artificial linkage intervals and randomly chosen genes. Comparative study showed that our method achieves improved performance compared to some classical diffusion-based methods. To further illustrate our method, we used our algorithm to predict new causing genes of 16 multifactorial diseases including Prostate cancer and Alzheimer's disease, and the top predictions were in good consistent with literature reports.
Our study indicates that integration of multiple information sources, especially the phenotype similarity profile data, and introduction of global similarity measure between disease and gene diffusion profiles are helpful for prioritizing candidate disease genes.
Programs and data are available upon request.
鉴定基因-表型关系是人类健康临床中的一个基本挑战。基于这样一种观察,即引起相同或相似表型的基因在蛋白质-蛋白质相互作用网络中往往相互关联,许多基于网络的方法已经根据不同的基础模型被提出来。最近的一项比较研究表明,基于扩散的方法具有最先进的预测性能。
在本文中,提出了一种新的基于扩散的方法来对候选疾病基因进行优先级排序。疾病的扩散谱被定义为在随机游走中重新启动时候选基因的平稳分布,其中表型之间的相似性被包含在内。然后,通过比较候选疾病基因的扩散谱与疾病的扩散谱来对候选疾病基因进行优先级排序。最后,通过对来自人工连锁区间和随机选择基因的对照基因进行留一交叉验证,证明了我们方法的有效性。比较研究表明,与一些经典的基于扩散的方法相比,我们的方法具有更好的性能。为了进一步说明我们的方法,我们使用我们的算法来预测包括前列腺癌和阿尔茨海默病在内的 16 种多因素疾病的新致病基因,排名靠前的预测与文献报道结果具有很好的一致性。
我们的研究表明,整合多种信息源,特别是表型相似性谱数据,并引入疾病和基因扩散谱之间的全局相似性度量,有助于对候选疾病基因进行优先级排序。
程序和数据可根据要求提供。