Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN 55455, USA.
Bioinformatics. 2011 Oct 1;27(19):2692-9. doi: 10.1093/bioinformatics/btr463. Epub 2011 Aug 8.
To validate the candidate disease genes identified from high-throughput genomic studies, a necessary step is to elucidate the associations between the set of candidate genes and disease phenotypes. The conventional gene set enrichment analysis often fails to reveal associations between disease phenotypes and the gene sets with a short list of poorly annotated genes, because the existing annotations of disease-causative genes are incomplete. This article introduces a network-based computational approach called rcNet to discover the associations between gene sets and disease phenotypes. A learning framework is proposed to maximize the coherence between the predicted phenotype-gene set relations and the known disease phenotype-gene associations. An efficient algorithm coupling ridge regression with label propagation and two variants are designed to find the optimal solution to the objective functions of the learning framework.
We evaluated the rcNet algorithms with leave-one-out cross-validation on Online Mendelian Inheritance in Man (OMIM) data and an independent test set of recently discovered disease-gene associations. In the experiments, the rcNet algorithms achieved best overall rankings compared with the baselines. To further validate the reproducibility of the performance, we applied the algorithms to identify the target diseases of novel candidate disease genes obtained from recent studies of Genome-Wide Association Study (GWAS), DNA copy number variation analysis and gene expression profiling. The algorithms ranked the target disease of the candidate genes at the top of the rank list in many cases across all the three case studies.
http://compbio.cs.umn.edu/dgsa_rcNet
为了验证高通量基因组研究中确定的候选疾病基因,阐明候选基因集与疾病表型之间的关联是必要的。传统的基因集富集分析往往无法揭示疾病表型与基因集之间的关联,因为疾病致病基因的现有注释并不完整。本文介绍了一种称为 rcNet 的基于网络的计算方法,用于发现基因集与疾病表型之间的关联。提出了一种学习框架,以最大化预测的表型-基因集关系与已知疾病表型-基因关联之间的一致性。设计了一种将岭回归与标签传播相结合的高效算法和两种变体,以找到学习框架的目标函数的最优解。
我们使用在线孟德尔遗传数据库(OMIM)数据和最近发现的疾病-基因关联的独立测试集进行了 rcNet 算法的留一交叉验证。在实验中,rcNet 算法与基线相比取得了最佳的整体排名。为了进一步验证性能的可重复性,我们将算法应用于从全基因组关联研究(GWAS)、DNA 拷贝数变异分析和基因表达谱分析等最近研究中获得的新候选疾病基因的目标疾病识别。在所有三个案例研究中,算法在许多情况下将候选基因的目标疾病排在排名列表的首位。
http://compbio.cs.umn.edu/dgsa_rcNet