The Broad Institute of Harvard and MIT, Cancer Program, Cambridge, MA, USA.
IET Syst Biol. 2010 Nov;4(6):428-40. doi: 10.1049/iet-syb.2010.0009.
A critical task in systems biology is the identification of genes that interact to control cellular processes by transcriptional activation of a set of target genes. Many methods have been developed that use statistical correlations in high-throughput data sets to infer such interactions. However, cellular pathways are highly cooperative, often requiring the joint effect of many molecules. Few methods have been proposed to explicitly identify such higher-order interactions, partially due to the fact that the notion of multivariate statistical dependence itself remains imprecisely defined. The authors define the concept of dependence among multiple variables using maximum entropy techniques and introduce computational tests for their identification. Synthetic network results reveal that this procedure uncovers dependencies even in undersampled regimes, when the joint probability distribution cannot be reliably estimated. Analysis of microarray data from human B cells reveals that third-order statistics, but not second-order ones, uncover relationships between genes that interact in a pathway to cooperatively regulate a common set of targets.
系统生物学的一个关键任务是通过一组靶基因的转录激活来识别相互作用的基因,从而控制细胞过程。已经开发出许多方法,这些方法利用高通量数据集的统计相关性来推断这种相互作用。然而,细胞途径是高度协作的,通常需要许多分子的共同作用。很少有方法被提出来明确识别这种更高阶的相互作用,部分原因是多元统计相关性本身的概念仍然没有明确定义。作者使用最大熵技术定义了多个变量之间的相关性概念,并引入了用于识别它们的计算测试。合成网络结果表明,即使在抽样不足的情况下(此时无法可靠地估计联合概率分布),该过程也可以发现依赖性。对人 B 细胞的微阵列数据分析表明,三阶统计量而不是二阶统计量揭示了在通路中相互作用以协同调节共同靶基因的基因之间的关系。