Degasperi Andrea, Fey Dirk, Kholodenko Boris N
Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland.
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
NPJ Syst Biol Appl. 2017 Aug 8;3:20. doi: 10.1038/s41540-017-0023-2. eCollection 2017.
Mathematical modelling of signalling pathways aids experimental investigation in system and synthetic biology. Ever increasing data availability prompts the development of large dynamic models with numerous parameters. In this paper, we investigate how the number of unknown parameters affects the convergence of three frequently used optimisation algorithms and four objective functions. We compare objective functions that use data-driven normalisation of the simulations with those that use scaling factors. The data-driven normalisation of the simulation approach implies that simulations are normalised in the same way as the data, making both directly comparable. The scaling factor approach, which is commonly used for parameter estimation in dynamic systems, introduces scaling factors that multiply the simulations to convert them to the scale of the data. Here we show that the scaling factor approach increases, compared to data-driven normalisation of the simulations, the degree of practical non-identifiability, defined as the number of directions in the parameter space, along which parameters are not identifiable. Further, the results indicate that data-driven normalisation of the simulations greatly improve the speed of convergence of all tested algorithms when the overall number of unknown parameters is relatively large (74 parameters in our test problems). Data-driven normalisation of the simulations also markedly improve the performance of the non-gradient-based algorithm tested even when the number of unknown parameters is relatively small (10 parameters in our test problems). As the models and the unknown parameters increase in size, the data-driven normalisation of the simulation approach can be the preferred option, because it does not aggravate non-identifiability and allows for obtaining parameter estimates in a reasonable amount of time.
信号通路的数学建模有助于系统生物学和合成生物学的实验研究。数据可用性的不断提高促使人们开发具有众多参数的大型动态模型。在本文中,我们研究了未知参数的数量如何影响三种常用优化算法和四个目标函数的收敛性。我们将使用数据驱动的模拟归一化的目标函数与使用缩放因子的目标函数进行了比较。模拟方法的数据驱动归一化意味着模拟与数据以相同的方式进行归一化,从而使两者可以直接比较。动态系统参数估计中常用的缩放因子方法引入了缩放因子,这些因子与模拟相乘,将其转换为数据的规模。我们在此表明,与模拟的数据驱动归一化相比,缩放因子方法增加了实际不可识别性的程度,实际不可识别性定义为参数空间中参数不可识别的方向数量。此外,结果表明,当未知参数的总数相对较大时(我们测试问题中有74个参数),模拟的数据驱动归一化极大地提高了所有测试算法的收敛速度。即使未知参数的数量相对较少(我们测试问题中有10个参数),模拟的数据驱动归一化也显著提高了所测试的非梯度算法的性能。随着模型和未知参数规模的增加,模拟方法的数据驱动归一化可能是首选选项,因为它不会加剧不可识别性,并且能够在合理的时间内获得参数估计值。