Viroquant Research Group Modeling, University of Heidelberg, Bioquant BQ26, INF 267, D-69120 Heidelberg, Germany.
BMC Bioinformatics. 2009 Dec 28;10:448. doi: 10.1186/1471-2105-10-448.
The reconstruction of gene regulatory networks from time series gene expression data is one of the most difficult problems in systems biology. This is due to several reasons, among them the combinatorial explosion of possible network topologies, limited information content of the experimental data with high levels of noise, and the complexity of gene regulation at the transcriptional, translational and post-translational levels. At the same time, quantitative, dynamic models, ideally with probability distributions over model topologies and parameters, are highly desirable.
We present a novel approach to infer such models from data, based on nonlinear differential equations, which we embed into a stochastic Bayesian framework. We thus address both the stochasticity of experimental data and the need for quantitative dynamic models. Furthermore, the Bayesian framework allows it to easily integrate prior knowledge into the inference process. Using stochastic sampling from the Bayes' posterior distribution, our approach can infer different likely network topologies and model parameters along with their respective probabilities from given data. We evaluate our approach on simulated data and the challenge #3 data from the DREAM 2 initiative. On the simulated data, we study effects of different levels of noise and dataset sizes. Results on real data show that the dynamics and main regulatory interactions are correctly reconstructed.
Our approach combines dynamic modeling using differential equations with a stochastic learning framework, thus bridging the gap between biophysical modeling and stochastic inference approaches. Results show that the method can reap the advantages of both worlds, and allows the reconstruction of biophysically accurate dynamic models from noisy data. In addition, the stochastic learning framework used permits the computation of probability distributions over models and model parameters, which holds interesting prospects for experimental design purposes.
从时间序列基因表达数据重建基因调控网络是系统生物学中最困难的问题之一。这是由于几个原因,其中包括可能的网络拓扑结构的组合爆炸,具有高水平噪声的实验数据的信息量有限,以及转录、翻译和翻译后水平的基因调控的复杂性。同时,理想情况下具有模型拓扑和参数的概率分布的定量、动态模型是非常需要的。
我们提出了一种从数据中推断此类模型的新方法,该方法基于非线性微分方程,并将其嵌入到随机贝叶斯框架中。因此,我们既解决了实验数据的随机性问题,又解决了对定量动态模型的需求。此外,贝叶斯框架允许它轻松地将先验知识集成到推理过程中。通过从贝叶斯后验分布中进行随机抽样,我们的方法可以从给定的数据中推断出不同的可能网络拓扑和模型参数,以及它们各自的概率。我们在模拟数据和 DREAM 2 计划的挑战 #3 数据上评估了我们的方法。在模拟数据上,我们研究了不同噪声水平和数据集大小的影响。真实数据上的结果表明,动态和主要调控相互作用得到了正确重建。
我们的方法结合了使用微分方程的动态建模和随机学习框架,从而弥合了生物物理建模和随机推理方法之间的差距。结果表明,该方法可以充分利用两个世界的优势,并允许从噪声数据中重建生物物理准确的动态模型。此外,所使用的随机学习框架允许对模型和模型参数进行概率分布的计算,这为实验设计目的提供了有趣的前景。