Huber Holly A, Finley Stacey D
bioRxiv. 2025 Jul 9:2025.05.23.655795. doi: 10.1101/2025.05.23.655795.
Computational models in systems biology are often underdetermined-that is, there is little data relative to the complexity and size of the model. The lack of data is primarily due to limits in our ability to observe specific biological systems and restricts the utility of computational models. However, there are a growing number of experimental databases in biology. While these databases provide more observations, they often do not have observations that match the system of interest exactly. For example, database measurements might be collected at different experimental conditions or on a different scale compared to the system of interest. Here, we investigate what information can be gleaned from generalizing databases across these differences in the context of modeling a specific system - cell signaling. Ultimately, our goal is to better determine models of specific systems, thereby increasing their utility. To do this, we propose a novel, multiscale, probabilistic framework. We use this framework to integrate measurements of protein structure from the Protein Data Bank and measurements of amino acid sequence from the Universal Protein Resource into the parameter inference of cell signaling models. Then, we quantify exactly what information is gained from these measurements when modeling cell signaling. We choose to investigate the utility of these databases in the context of dynamic cell signaling models because experimental measurements of the variables of interest, protein dynamics, are still quite limited. We find that we can successfully integrate measurements from these databases to significantly improve parameter estimation of signaling models. The impact of sequence and structure measurements on model predictions depends on the sensitivity of the prediction to perturbations in the parameter values. Overall, this study demonstrates that measurements of protein structure and amino acid sequence can be leveraged to better inform parameters in models of cell signaling.
Computational models of cell signaling have provided mechanistic insights into complex biological systems, including in physiological and disease settings. Accurate and predictive modeling critically depends on the precise estimation of model parameters, which is often hindered by the limited availability of experimental data. In this study, we present a novel multiscale probabilistic inference framework that broadens the scope of data types that can be leveraged for parameter estimation for models of cell signaling. The framework integrates a machine learning pipeline with a generalizable parameter inference approach, enabling the use of experimental data across scales. Specifically, we demonstrate that incorporating protein amino acid sequence and 3D structural data enhances parameter estimation compared to traditional measurements such as protein concentrations over time. Improving parameter estimation increases the robustness and applicability of cell signaling models. Ultimately, our framework facilitates use of a broader range of data and supports the development of predictive computational models that increase our understanding of cell signaling.
系统生物学中的计算模型常常是欠定的,也就是说,相对于模型的复杂性和规模而言,数据很少。数据的缺乏主要是由于我们观察特定生物系统的能力有限,这限制了计算模型的实用性。然而,生物学中的实验数据库数量在不断增加。虽然这些数据库提供了更多的观测数据,但它们通常没有与感兴趣的系统完全匹配的观测数据。例如,与感兴趣的系统相比,数据库测量可能是在不同的实验条件下或不同的尺度上收集的。在这里,我们研究在对特定系统——细胞信号传导进行建模的背景下,从跨越这些差异的数据库泛化中可以收集到哪些信息。最终,我们的目标是更好地确定特定系统的模型,从而提高其效用。为此,我们提出了一个新颖的、多尺度的概率框架。我们使用这个框架将来自蛋白质数据库的蛋白质结构测量数据和来自通用蛋白质资源的氨基酸序列测量数据整合到细胞信号传导模型的参数推断中。然后,我们精确量化在对细胞信号传导进行建模时从这些测量中获得了哪些信息。我们选择在动态细胞信号传导模型的背景下研究这些数据库的效用,因为对感兴趣的变量——蛋白质动力学的实验测量仍然非常有限。我们发现我们可以成功地整合来自这些数据库的测量数据,以显著改善信号传导模型的参数估计。序列和结构测量对模型预测的影响取决于预测对参数值扰动的敏感性。总体而言,这项研究表明蛋白质结构和氨基酸序列的测量可以用来更好地为细胞信号传导模型中的参数提供信息。
细胞信号传导的计算模型为包括生理和疾病环境在内的复杂生物系统提供了机制性见解。准确且具有预测性的建模关键取决于模型参数的精确估计,而这常常受到实验数据有限可用性的阻碍。在这项研究中,我们提出了一个新颖的多尺度概率推断框架,该框架拓宽了可用于细胞信号传导模型参数估计的数据类型范围。该框架将机器学习管道与一种可泛化的参数推断方法相结合,能够跨尺度使用实验数据。具体而言,我们证明与传统测量(如随时间变化的蛋白质浓度)相比,纳入蛋白质氨基酸序列和三维结构数据可增强参数估计。改进参数估计可提高细胞信号传导模型的稳健性和适用性。最终,我们的框架有助于使用更广泛的数据范围,并支持开发能够增进我们对细胞信号传导理解的预测性计算模型。