Bioinformatics Research Centre, School of Computer Engineering, Nanyang Technological University, Singapore, Singapore.
PLoS One. 2011;6(7):e21502. doi: 10.1371/journal.pone.0021502. Epub 2011 Jul 25.
Phenotypically similar diseases have been found to be caused by functionally related genes, suggesting a modular organization of the genetic landscape of human diseases that mirrors the modularity observed in biological interaction networks. Protein complexes, as molecular machines that integrate multiple gene products to perform biological functions, express the underlying modular organization of protein-protein interaction networks. As such, protein complexes can be useful for interrogating the networks of phenome and interactome to elucidate gene-phenotype associations of diseases.
METHODOLOGY/PRINCIPAL FINDINGS: We proposed a technique called RWPCN (Random Walker on Protein Complex Network) for predicting and prioritizing disease genes. The basis of RWPCN is a protein complex network constructed using existing human protein complexes and protein interaction network. To prioritize candidate disease genes for the query disease phenotypes, we compute the associations between the protein complexes and the query phenotypes in their respective protein complex and phenotype networks. We tested RWPCN on predicting gene-phenotype associations using leave-one-out cross-validation; our method was observed to outperform existing approaches. We also applied RWPCN to predict novel disease genes for two representative diseases, namely, Breast Cancer and Diabetes.
CONCLUSIONS/SIGNIFICANCE: Guilt-by-association prediction and prioritization of disease genes can be enhanced by fully exploiting the underlying modular organizations of both the disease phenome and the protein interactome. Our RWPCN uses a novel protein complex network as a basis for interrogating the human phenome-interactome network. As the protein complex network can capture the underlying modularity in the biological interaction networks better than simple protein interaction networks, RWPCN was found to be able to detect and prioritize disease genes better than traditional approaches that used only protein-phenotype associations.
具有相似表型的疾病已被发现是由功能相关的基因引起的,这表明人类疾病的遗传景观具有模块化组织,反映了在生物相互作用网络中观察到的模块化。蛋白质复合物作为整合多个基因产物以执行生物功能的分子机器,表达了蛋白质-蛋白质相互作用网络的潜在模块化组织。因此,蛋白质复合物可用于研究表型和相互作用组网络,以阐明疾病的基因-表型关联。
方法/主要发现:我们提出了一种称为 RWPCN(基于蛋白质复合物网络的随机游走)的技术,用于预测和优先考虑疾病基因。RWPCN 的基础是使用现有人类蛋白质复合物和蛋白质相互作用网络构建的蛋白质复合物网络。为了对查询疾病表型的候选疾病基因进行优先级排序,我们在各自的蛋白质复合物和表型网络中计算蛋白质复合物与查询表型之间的关联。我们使用留一交叉验证测试了 RWPCN 对基因-表型关联的预测;我们的方法被观察到优于现有方法。我们还将 RWPCN 应用于预测两种代表性疾病,即乳腺癌和糖尿病的新的疾病基因。
结论/意义:通过充分利用疾病表型和蛋白质相互作用组的潜在模块化组织,可以增强基于关联的疾病基因预测和优先级排序。我们的 RWPCN 使用新颖的蛋白质复合物网络作为探索人类表型-相互作用网络的基础。由于蛋白质复合物网络比简单的蛋白质相互作用网络更能捕获生物相互作用网络中的潜在模块化,因此发现 RWPCN 比仅使用蛋白质-表型关联的传统方法能够更好地检测和优先考虑疾病基因。