Castelo Robert, Roverato Alberto
Research Program on Biomedical Informatics, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain.
Methods Mol Biol. 2012;802:215-33. doi: 10.1007/978-1-61779-400-1_14.
Regulatory networks inferred from microarray data sets provide an estimated blueprint of the functional interactions taking place under the assayed experimental conditions. In each of these experiments, the gene expression pathway exerts a finely tuned control simultaneously over all genes relevant to the cellular state. This renders most pairs of those genes significantly correlated, and therefore, the challenge faced by every method that aims at inferring a molecular regulatory network from microarray data, lies in distinguishing direct from indirect interactions. A straightforward solution to this problem would be to move directly from bivariate to multivariate statistical approaches. However, the daunting dimension of typical microarray data sets, with a number of genes p several orders of magnitude larger than the number of samples n, precludes the application of standard multivariate techniques and confronts the biologist with sophisticated procedures that address this situation. We have introduced a new way to approach this problem in an intuitive manner, based on limited-order partial correlations, and in this chapter we illustrate this method through the R package qpgraph, which forms part of the Bioconductor project and is available at its Web site.
从微阵列数据集推断出的调控网络提供了在被检测实验条件下发生的功能相互作用的估计蓝图。在这些实验的每一个中,基因表达途径对与细胞状态相关的所有基因同时施加精细调控。这使得这些基因中的大多数对显著相关,因此,旨在从微阵列数据推断分子调控网络的每种方法所面临的挑战在于区分直接相互作用和间接相互作用。解决这个问题的一个直接方法是直接从双变量统计方法转向多变量统计方法。然而,典型微阵列数据集的规模令人望而生畏,基因数量p比样本数量n大几个数量级,这排除了标准多变量技术的应用,并使生物学家面临处理这种情况的复杂程序。我们基于有限阶偏相关,以一种直观的方式引入了一种解决这个问题的新方法,在本章中,我们通过R包qpgraph来说明这种方法,该包是生物导体项目的一部分,可在其网站上获取。