Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.
PLoS One. 2007 Jun 27;2(6):e562. doi: 10.1371/journal.pone.0000562.
Gene Regulatory Networks (GRNs) have become a major focus of interest in recent years. A number of reverse engineering approaches have been developed to help uncover the regulatory networks giving rise to the observed gene expression profiles. However, this is an overspecified problem due to the fact that more than one genotype (network wiring) can give rise to the same phenotype. We refer to this phenomenon as "gene elasticity." In this work, we study the effect of this particular problem on the pure, data-driven inference of gene regulatory networks.
We simulated a four-gene network in order to produce "data" (protein levels) that we use in lieu of real experimental data. We then optimized the network connections between the four genes with a view to obtain the original network that gave rise to the data. We did this for two different cases: one in which only the network connections were optimized and the other in which both the network connections as well as the kinetic parameters (given as reaction probabilities in our case) were estimated. We observed that multiple genotypes gave rise to very similar protein levels. Statistical experimentation indicates that it is impossible to differentiate between the different networks on the basis of both equilibrium as well as dynamic data.
We show explicitly that reverse engineering of GRNs from pure expression data is an indeterminate problem. Our results suggest the unsuitability of an inferential, purely data-driven approach for the reverse engineering transcriptional networks in the case of gene regulatory networks displaying a certain level of complexity.
近年来,基因调控网络(GRNs)已成为研究的重点。已经开发了许多反向工程方法来帮助揭示产生观察到的基因表达谱的调控网络。然而,由于一个以上的基因型(网络布线)可以产生相同的表型,这是一个过度指定的问题。我们将这种现象称为“基因弹性”。在这项工作中,我们研究了这个特殊问题对基因调控网络的纯数据驱动推理的影响。
我们模拟了一个四基因网络,以产生我们用于替代真实实验数据的“数据”(蛋白质水平)。然后,我们优化了四个基因之间的网络连接,以期获得产生数据的原始网络。我们为此进行了两种不同的情况:一种是仅优化网络连接,另一种是同时优化网络连接以及动力学参数(在我们的情况下表示为反应概率)。我们观察到多个基因型产生非常相似的蛋白质水平。统计实验表明,基于平衡和动态数据,不可能区分不同的网络。
我们明确表明,从纯表达数据中反向工程 GRNs 是一个不确定的问题。我们的结果表明,在显示一定复杂程度的基因调控网络的情况下,对于转录网络的反向工程,推断性的、纯粹的数据驱动方法是不合适的。