Hasegawa Takanori, Mori Tomoya, Yamaguchi Rui, Shimamura Teppei, Miyano Satoru, Imoto Seiya, Akutsu Tatsuya
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Kyoto, 611-0011 Uji, Japan.
Human Genome Center, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Tokyo, 108-8639 Minato-ku, Japan.
BMC Syst Biol. 2015 Mar 13;9:14. doi: 10.1186/s12918-015-0154-2.
As a result of recent advances in biotechnology, many findings related to intracellular systems have been published, e.g., transcription factor (TF) information. Although we can reproduce biological systems by incorporating such findings and describing their dynamics as mathematical equations, simulation results can be inconsistent with data from biological observations if there are inaccurate or unknown parts in the constructed system. For the completion of such systems, relationships among genes have been inferred through several computational approaches, which typically apply several abstractions, e.g., linearization, to handle the heavy computational cost in evaluating biological systems. However, since these approximations can generate false regulations, computational methods that can infer regulatory relationships based on less abstract models incorporating existing knowledge have been strongly required.
We propose a new data assimilation algorithm that utilizes a simple nonlinear regulatory model and a state space representation to infer gene regulatory networks (GRNs) using time-course observation data. For the estimation of the hidden state variables and the parameter values, we developed a novel method termed a higher moment ensemble particle filter (HMEnPF) that can retain first four moments of the conditional distributions through filtering steps. Starting from the original model, e.g., derived from the literature, the proposed algorithm can sequentially evaluate candidate models, which are generated by partially changing the current best model, to find the model that can best predict the data. For the performance evaluation, we generated six synthetic data based on two real biological networks and evaluated effectiveness of the proposed algorithm by improving the networks inferred by previous methods. We then applied time-course observation data of rat skeletal muscle stimulated with corticosteroid. Since a corticosteroid pharmacogenomic pathway, its kinetic/dynamics and TF candidate genes have been partially elucidated, we incorporated these findings and inferred an extended pathway of rat pharmacogenomics.
Through the simulation study, the proposed algorithm outperformed previous methods and successfully improved the regulatory structure inferred by the previous methods. Furthermore, the proposed algorithm could extend a corticosteroid related pathway, which has been partially elucidated, with incorporating several information sources.
由于生物技术的最新进展,许多与细胞内系统相关的研究成果已发表,例如转录因子(TF)信息。尽管我们可以通过纳入这些研究成果并将其动态描述为数学方程来重现生物系统,但如果构建的系统中存在不准确或未知的部分,模拟结果可能与生物学观察数据不一致。为了完善此类系统,已通过几种计算方法推断基因之间的关系,这些方法通常应用几种抽象方法,例如线性化,以处理评估生物系统时的繁重计算成本。然而,由于这些近似可能会产生错误的调控,因此迫切需要能够基于包含现有知识的不太抽象的模型推断调控关系的计算方法。
我们提出了一种新的数据同化算法,该算法利用简单的非线性调控模型和状态空间表示,使用时间进程观察数据推断基因调控网络(GRN)。为了估计隐藏状态变量和参数值,我们开发了一种称为高阶矩集合粒子滤波器(HMEnPF)的新方法,该方法可以通过滤波步骤保留条件分布的前四个矩。从原始模型(例如从文献中推导出来的模型)开始,所提出的算法可以顺序评估候选模型,这些候选模型是通过部分改变当前最佳模型生成的,以找到最能预测数据的模型。为了进行性能评估,我们基于两个真实生物网络生成了六个合成数据,并通过改进先前方法推断的网络来评估所提出算法的有效性。然后,我们应用了用皮质类固醇刺激的大鼠骨骼肌的时间进程观察数据。由于皮质类固醇药物基因组学途径、其动力学/动态以及TF候选基因已被部分阐明,我们纳入了这些研究成果并推断出大鼠药物基因组学的扩展途径。
通过模拟研究,所提出的算法优于先前的方法,并成功改进了先前方法推断的调控结构。此外,所提出的算法可以通过纳入多个信息源来扩展已被部分阐明的与皮质类固醇相关的途径。