School of Mathematics, Shandong University, Jinan, Shandong 250100, PR China.
BMC Bioinformatics. 2009 Sep 3;10:277. doi: 10.1186/1471-2105-10-277.
Many aspects of biological functions can be modeled by biological networks, such as protein interaction networks, metabolic networks, and gene coexpression networks. Studying the statistical properties of these networks in turn allows us to infer biological function. Complex statistical network models can potentially more accurately describe the networks, but it is not clear whether such complex models are better suited to find biologically meaningful subnetworks.
Recent studies have shown that the degree distribution of the nodes is not an adequate statistic in many molecular networks. We sought to extend this statistic with 2nd and 3rd order degree correlations and developed a pseudo-likelihood approach to estimate the parameters. The approach was used to analyze the MIPS and BIOGRID yeast protein interaction networks, and two yeast coexpression networks. We showed that 2nd order degree correlation information gave better predictions of gene interactions in both protein interaction and gene coexpression networks. However, in the biologically important task of predicting functionally homogeneous modules, degree correlation information performs marginally better in the case of the MIPS and BIOGRID protein interaction networks, but worse in the case of gene coexpression networks.
Our use of dK models showed that incorporation of degree correlations could increase predictive power in some contexts, albeit sometimes marginally, but, in all contexts, the use of third-order degree correlations decreased accuracy. However, it is possible that other parameter estimation methods, such as maximum likelihood, will show the usefulness of incorporating 2nd and 3rd degree correlations in predicting functionally homogeneous modules.
许多生物学功能都可以通过生物网络来建模,例如蛋白质相互作用网络、代谢网络和基因共表达网络。反过来,研究这些网络的统计特性可以帮助我们推断生物学功能。复杂的统计网络模型可以潜在地更准确地描述网络,但尚不清楚这种复杂模型是否更适合发现具有生物学意义的子网。
最近的研究表明,在许多分子网络中,节点的度分布并不是一个充分的统计量。我们试图通过二阶和三阶度相关来扩展这个统计量,并开发了一种伪似然方法来估计参数。该方法用于分析酵母蛋白质相互作用网络 MIPS 和 BIOGRID,以及两个酵母共表达网络。我们表明,二阶度相关信息可以更好地预测蛋白质相互作用和基因共表达网络中的基因相互作用。然而,在预测功能同质模块这一生物学上重要的任务中,在 MIPS 和 BIOGRID 蛋白质相互作用网络的情况下,度相关信息的性能略有提高,但在基因共表达网络的情况下则较差。
我们使用 dK 模型表明,在某些情况下,纳入度相关性可以提高预测能力,尽管有时只是略有提高,但在所有情况下,使用三阶度相关性都会降低准确性。然而,也有可能其他参数估计方法,如最大似然法,将显示在预测功能同质模块时纳入二阶和三阶度相关性的有用性。