Oguz Cihan, Laomettachit Teeraphan, Chen Katherine C, Watson Layne T, Baumann William T, Tyson John J
Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA.
BMC Syst Biol. 2013 Jun 28;7:53. doi: 10.1186/1752-0509-7-53.
Parameter estimation from experimental data is critical for mathematical modeling of protein regulatory networks. For realistic networks with dozens of species and reactions, parameter estimation is an especially challenging task. In this study, we present an approach for parameter estimation that is effective in fitting a model of the budding yeast cell cycle (comprising 26 nonlinear ordinary differential equations containing 126 rate constants) to the experimentally observed phenotypes (viable or inviable) of 119 genetic strains carrying mutations of cell cycle genes.
Starting from an initial guess of the parameter values, which correctly captures the phenotypes of only 72 genetic strains, our parameter estimation algorithm quickly improves the success rate of the model to 105-111 of the 119 strains. This success rate is comparable to the best values achieved by a skilled modeler manually choosing parameters over many weeks. The algorithm combines two search and optimization strategies. First, we use Latin hypercube sampling to explore a region surrounding the initial guess. From these samples, we choose ∼20 different sets of parameter values that correctly capture wild type viability. These sets form the starting generation of differential evolution that selects new parameter values that perform better in terms of their success rate in capturing phenotypes. In addition to producing highly successful combinations of parameter values, we analyze the results to determine the parameters that are most critical for matching experimental outcomes and the most competitive strains whose correct outcome with a given parameter vector forces numerous other strains to have incorrect outcomes. These "most critical parameters" and "most competitive strains" provide biological insights into the model. Conversely, the "least critical parameters" and "least competitive strains" suggest ways to reduce the computational complexity of the optimization.
Our approach proves to be a useful tool to help systems biologists fit complex dynamical models to large experimental datasets. In the process of fitting the model to the data, the tool identifies suggestive correlations among aspects of the model and the data.
从实验数据中进行参数估计对于蛋白质调控网络的数学建模至关重要。对于具有数十种物种和反应的实际网络,参数估计是一项特别具有挑战性的任务。在本研究中,我们提出了一种参数估计方法,该方法能有效地将芽殖酵母细胞周期模型(由包含126个速率常数的26个非线性常微分方程组成)与119个携带细胞周期基因突变的遗传菌株的实验观察表型(存活或不存活)进行拟合。
从仅能正确捕捉72个遗传菌株表型的参数值初始猜测开始,我们的参数估计算法迅速将模型的成功率提高到119个菌株中的105 - 111个。这个成功率与熟练的建模者经过数周手动选择参数所达到的最佳值相当。该算法结合了两种搜索和优化策略。首先,我们使用拉丁超立方抽样来探索初始猜测周围的区域。从这些样本中,我们选择约20组能正确捕捉野生型活力的不同参数值。这些集合构成了差分进化的起始代,其选择在捕捉表型成功率方面表现更好的新参数值。除了产生非常成功的参数值组合外,我们还分析结果以确定对于匹配实验结果最关键的参数以及最具竞争力的菌株,给定参数向量下其正确结果会迫使许多其他菌株产生错误结果。这些“最关键参数”和“最具竞争力菌株”为模型提供了生物学见解。相反,“最不关键参数”和“最不具竞争力菌株”则提示了降低优化计算复杂度的方法。
我们的方法被证明是帮助系统生物学家将复杂动态模型与大型实验数据集进行拟合的有用工具。在将模型与数据拟合的过程中,该工具识别出模型和数据各方面之间有启发性的相关性。