Zhang Jiexin, Ji Yuan, Zhang Li
Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, 1515 Holcombe Boulevard, Unit 237, Houston, TX 77030-4009, USA.
Bioinformatics. 2007 Nov 1;23(21):2903-9. doi: 10.1093/bioinformatics/btm482. Epub 2007 Oct 5.
It is an important and difficult task to extract gene network information from high-throughput genomic data. A common approach is to cluster genes using pairwise correlation as a distance metric. However, pairwise correlation is clearly too simplistic to describe the complex relationships among real genes since co-expression relationships are often restricted to a specific set of biological conditions/processes. In this study, we described a three-way gene interaction model that captures the dynamic nature of co-expression relationship between a gene pair through the introduction of a controller gene.
We surveyed 0.4 billion possible three-way interactions among 1000 genes in a microarray dataset containing 678 human cancer samples. To test the reproducibility and statistical significance of our results, we randomly split the samples into a training set and a testing set. We found that the gene triplets with the strongest interactions (i.e. with the smallest P-values from appropriate statistical tests) in the training set also had the strongest interactions in the testing set. A distinctive pattern of three-way interaction emerged from these gene triplets: depending on the third gene being expressed or not, the remaining two genes can be either co-expressed or mutually exclusive (i.e. expression of either one of them would repress the other). Such three-way interactions can exist without apparent pairwise correlations. The identified three-way interactions may constitute candidates for further experimentation using techniques such as RNA interference, so that novel gene network or pathways could be identified.
从高通量基因组数据中提取基因网络信息是一项重要且艰巨的任务。一种常见的方法是使用成对相关性作为距离度量来对基因进行聚类。然而,成对相关性显然过于简单,无法描述真实基因之间的复杂关系,因为共表达关系通常局限于特定的一组生物学条件/过程。在本研究中,我们描述了一种三向基因相互作用模型,该模型通过引入一个控制基因来捕捉基因对之间共表达关系的动态本质。
我们在一个包含678个人类癌症样本的微阵列数据集中,对1000个基因之间的4亿种可能的三向相互作用进行了调查。为了测试我们结果的可重复性和统计显著性,我们将样本随机分为训练集和测试集。我们发现,在训练集中具有最强相互作用(即来自适当统计检验的P值最小)的基因三联体在测试集中也具有最强的相互作用。从这些基因三联体中出现了一种独特的三向相互作用模式:取决于第三个基因是否表达,其余两个基因可以是共表达的,也可以是相互排斥的(即其中任何一个的表达都会抑制另一个)。这种三向相互作用可以在没有明显成对相关性的情况下存在。所识别出的三向相互作用可能构成使用RNA干扰等技术进行进一步实验的候选对象,从而可以识别新的基因网络或途径。