Nagasaki Masao, Yamaguchi Rui, Yoshida Ryo, Imoto Seiya, Doi Atsushi, Tamada Yoshinori, Matsuno Hiroshi, Miyano Satoru, Higuchi Tomoyuki
Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan.
Genome Inform. 2006;17(1):46-61.
We propose an automatic construction method of the hybrid functional Petri net as a simulation model of biological pathways. The problems we consider are how we choose the values of parameters and how we set the network structure. Usually, we tune these unknown factors empirically so that the simulation results are consistent with biological knowledge. Obviously, this approach has the limitation in the size of network of interest. To extend the capability of the simulation model, we propose the use of data assimilation approach that was originally established in the field of geophysical simulation science. We provide genomic data assimilation framework that establishes a link between our simulation model and observed data like microarray gene expression data by using a nonlinear state space model. A key idea of our genomic data assimilation is that the unknown parameters in simulation model are converted as the parameter of the state space model and the estimates are obtained as the maximum a posteriori estimators. In the parameter estimation process, the simulation model is used to generate the system model in the state space model. Such a formulation enables us to handle both the model construction and the parameter tuning within a framework of the Bayesian statistical inferences. In particular, the Bayesian approach provides us a way of controlling overfitting during the parameter estimations that is essential for constructing a reliable biological pathway. We demonstrate the effectiveness of our approach using synthetic data. As a result, parameter estimation using genomic data assimilation works very well and the network structure is suitably selected.
我们提出了一种混合功能Petri网的自动构建方法,作为生物途径的模拟模型。我们考虑的问题是如何选择参数值以及如何设置网络结构。通常,我们凭经验调整这些未知因素,以使模拟结果与生物学知识一致。显然,这种方法在感兴趣的网络规模方面存在局限性。为了扩展模拟模型的能力,我们建议使用最初在地球物理模拟科学领域建立的数据同化方法。我们提供了基因组数据同化框架,该框架通过使用非线性状态空间模型在我们的模拟模型和诸如微阵列基因表达数据等观测数据之间建立联系。我们基因组数据同化的一个关键思想是,将模拟模型中的未知参数转换为状态空间模型的参数,并将估计值作为最大后验估计器获得。在参数估计过程中,模拟模型用于在状态空间模型中生成系统模型。这样的公式使我们能够在贝叶斯统计推断的框架内处理模型构建和参数调整。特别是,贝叶斯方法为我们提供了一种在参数估计期间控制过拟合的方法,这对于构建可靠的生物途径至关重要。我们使用合成数据证明了我们方法的有效性。结果,使用基因组数据同化的参数估计效果很好,并且网络结构得到了适当的选择。