Haye Alexandre, Albert Jaroslav, Rooman Marianne
BioSystems, BioModeling & BioProcesses Department, Université Libre de Bruxelles, CP 165/61, Avenue Roosevelt 50, 1050 Bruxelles, Belgium.
BMC Res Notes. 2012 Jan 19;5:46. doi: 10.1186/1756-0500-5-46.
This paper lies in the context of modeling the evolution of gene expression away from stationary states, for example in systems subject to external perturbations or during the development of an organism. We base our analysis on experimental data and proceed in a top-down approach, where we start from data on a system's transcriptome, and deduce rules and models from it without a priori knowledge. We focus here on a publicly available DNA microarray time series, representing the transcriptome of Drosophila across evolution from the embryonic to the adult stage.
In the first step, genes were clustered on the basis of similarity of their expression profiles, measured by a translation-invariant and scale-invariant distance that proved appropriate for detecting transitions between development stages. Average profiles representing each cluster were computed and their time evolution was analyzed using coupled differential equations. A linear and several non-linear model structures involving a transcription and a degradation term were tested. The parameters were identified in three steps: determination of the strongest connections between genes, optimization of the parameters defining these connections, and elimination of the unnecessary parameters using various reduction schemes. Different solutions were compared on the basis of their abilities to reproduce the data, to keep realistic gene expression levels when extrapolated in time, to show the biologically expected robustness with respect to parameter variations, and to contain as few parameters as possible.
We showed that the linear model did very well in reproducing the data with few parameters, but was not sufficiently robust and yielded unrealistic values upon extrapolation in time. In contrast, the non-linear models all reached the latter two objectives, but some were unable to reproduce the data. A family of non-linear models, constructed from the exponential of linear combinations of expression levels, reached all the objectives. It defined networks with a mean number of connections equal to two, when restricted to the embryonic time series, and equal to five for the full time series. These networks were compared with experimental data about gene-transcription factor and protein-protein interactions. The non-uniqueness of the solutions was discussed in the context of plasticity and cluster versus single-gene networks.
本文处于对基因表达从稳态演化进行建模的背景下,例如在受到外部扰动的系统中或生物体发育过程中。我们的分析基于实验数据,并采用自上而下的方法,即从系统转录组的数据开始,在没有先验知识的情况下从中推导规则和模型。我们在此聚焦于一个公开可用的DNA微阵列时间序列,它代表了果蝇从胚胎期到成年期整个进化过程中的转录组。
第一步,根据基因表达谱的相似性对基因进行聚类,通过一种平移不变且尺度不变的距离来衡量,该距离被证明适用于检测发育阶段之间的转变。计算代表每个聚类的平均谱,并使用耦合微分方程分析其时间演化。测试了涉及转录和降解项的线性和几种非线性模型结构。参数分三步确定:确定基因之间最强的连接,优化定义这些连接的参数,以及使用各种简化方案消除不必要的参数。根据不同解决方案再现数据的能力、在时间外推时保持现实基因表达水平的能力、对参数变化显示出生物学预期的稳健性的能力以及包含尽可能少的参数的能力对不同解决方案进行比较。
我们表明线性模型在以较少参数再现数据方面表现良好,但不够稳健,并且在时间外推时产生不现实的值。相比之下,非线性模型都达到了后两个目标,但有些无法再现数据。一类由表达水平的线性组合的指数构建的非线性模型达到了所有目标。当限于胚胎时间序列时,它定义的网络平均连接数等于2,对于完整时间序列则等于5。将这些网络与关于基因 - 转录因子和蛋白质 - 蛋白质相互作用的实验数据进行了比较。在可塑性以及聚类网络与单基因网络的背景下讨论了解决方案的非唯一性。