Geier Florian, Timmer Jens, Fleck Christian
Institute of Physics, University of Freiburg, Hermann-Herder Str, 3, 79104 Freiburg, Germany.
BMC Syst Biol. 2007 Feb 2;1:11. doi: 10.1186/1752-0509-1-11.
Cellular processes are controlled by gene-regulatory networks. Several computational methods are currently used to learn the structure of gene-regulatory networks from data. This study focusses on time series gene expression and gene knock-out data in order to identify the underlying network structure. We compare the performance of different network reconstruction methods using synthetic data generated from an ensemble of reference networks. Data requirements as well as optimal experiments for the reconstruction of gene-regulatory networks are investigated. Additionally, the impact of prior knowledge on network reconstruction as well as the effect of unobserved cellular processes is studied.
We identify linear Gaussian dynamic Bayesian networks and variable selection based on F-statistics as suitable methods for the reconstruction of gene-regulatory networks from time series data. Commonly used discrete dynamic Bayesian networks perform inferior and this result can be attributed to the inevitable information loss by discretization of expression data. It is shown that short time series generated under transcription factor knock-out are optimal experiments in order to reveal the structure of gene regulatory networks. Relative to the level of observational noise, we give estimates for the required amount of gene expression data in order to accurately reconstruct gene-regulatory networks. The benefit of using of prior knowledge within a Bayesian learning framework is found to be limited to conditions of small gene expression data size. Unobserved processes, like protein-protein interactions, induce dependencies between gene expression levels similar to direct transcriptional regulation. We show that these dependencies cannot be distinguished from transcription factor mediated gene regulation on the basis of gene expression data alone.
Currently available data size and data quality make the reconstruction of gene networks from gene expression data a challenge. In this study, we identify an optimal type of experiment, requirements on the gene expression data quality and size as well as appropriate reconstruction methods in order to reverse engineer gene regulatory networks from time series data.
细胞过程由基因调控网络控制。目前有几种计算方法用于从数据中学习基因调控网络的结构。本研究聚焦于时间序列基因表达和基因敲除数据,以识别潜在的网络结构。我们使用从一组参考网络生成的合成数据比较不同网络重建方法的性能。研究了基因调控网络重建的数据要求以及最优实验。此外,还研究了先验知识对网络重建的影响以及未观察到的细胞过程的作用。
我们确定基于线性高斯动态贝叶斯网络和基于F统计量的变量选择是从时间序列数据重建基因调控网络的合适方法。常用的离散动态贝叶斯网络表现较差,这一结果可归因于表达数据离散化不可避免的信息损失。结果表明,在转录因子敲除条件下生成的短时间序列是揭示基因调控网络结构的最优实验。相对于观测噪声水平,我们给出了准确重建基因调控网络所需基因表达数据量的估计。发现在贝叶斯学习框架内使用先验知识的好处仅限于基因表达数据量较小的情况。未观察到的过程,如蛋白质-蛋白质相互作用,会诱导基因表达水平之间的依赖性,类似于直接转录调控。我们表明,仅基于基因表达数据无法区分这些依赖性与转录因子介导的基因调控。
目前可用的数据量和数据质量使得从基因表达数据重建基因网络成为一项挑战。在本研究中,我们确定了一种最优的实验类型、对基因表达数据质量和量的要求以及合适的重建方法,以便从时间序列数据反向工程基因调控网络。