Molecular and Cellular Biophysics , University of Denver , Denver , Colorado 80209 , United States.
Department of Physics and Astronomy , University of Denver , Denver , Colorado 80209 , United States.
J Phys Chem B. 2018 May 31;122(21):5666-5677. doi: 10.1021/acs.jpcb.7b12251. Epub 2018 Feb 15.
Gene networks with feedback often involve interactions between multiple species of biomolecules, much more than experiments can actually monitor. Coupled with this is the challenge that experiments often measure gene expression in noisy fluorescence instead of protein numbers. How do we infer biophysical information and characterize the underlying circuits from this limited and convoluted data? We address this by building stochastic models using the principle of Maximum Caliber (MaxCal). MaxCal uses the basic information on synthesis, degradation, and feedback-without invoking any other auxiliary species and ad hoc reactions-to generate stochastic trajectories similar to those typically measured in experiments. MaxCal in conjunction with Maximum Likelihood (ML) can infer parameters of the model using fluctuating trajectories of protein expression over time. We demonstrate the success of the MaxCal + ML methodology using synthetic data generated from known circuits of different genetic switches: (i) a single-gene autoactivating circuit involving five species (including mRNA), (ii) a mutually repressing two-gene circuit (toggle switch) with seven species (including mRNA) considering stochastic time traces of two proteins, and (iii) the same toggle switch circuit considering stochastic time traces of only one of the two proteins. To further challenge the MaxCal + ML inference scheme, we repeat our analysis for the second and third scenario with traces expressed in noisy fluorescence instead of protein number to closely mimic typical experiments. We show that, for all of these models with increasing complexity and obfuscation, the minimal model of MaxCal is still able to capture the fluctuations of the trajectory and infer basic underlying rate parameters when benchmarked against the known values used to generate the synthetic data. Importantly, the model also yields an effective feedback parameter that can be used to quantify interactions within these circuits. These applications show the promise of MaxCal's ability to characterize circuits with limited data, and its utility to better understand evolution and advance design strategies for specific functions.
具有反馈的基因网络通常涉及多种生物分子之间的相互作用,远远超出了实验实际可以监测的范围。此外,实验通常以嘈杂的荧光而非蛋白质数量来测量基因表达,这也是一个挑战。我们如何从这些有限且复杂的数据中推断生物物理信息并描述潜在的电路?我们通过使用最大口径(MaxCal)原理构建随机模型来解决这个问题。MaxCal 使用关于合成、降解和反馈的基本信息-无需调用任何其他辅助物质和特定反应-生成类似于实验中通常测量的随机轨迹。MaxCal 与最大似然(ML)相结合,可以使用蛋白质表达随时间波动的轨迹来推断模型的参数。我们使用不同遗传开关的已知电路生成的合成数据证明了 MaxCal + ML 方法的成功:(i)涉及五个物种(包括 mRNA)的单基因自激活电路,(ii)具有七个物种(包括 mRNA)的相互抑制的两个基因电路(toggle switch),考虑到两个蛋白质的随机时间轨迹,以及(iii)仅考虑两个蛋白质之一的随机时间轨迹的相同 toggle switch 电路。为了进一步挑战 MaxCal + ML 推断方案,我们使用嘈杂荧光而不是蛋白质数量来表示的第二和第三个场景的轨迹重复我们的分析,以紧密模拟典型实验。我们表明,对于所有这些具有越来越复杂和混乱的模型,MaxCal 的最小模型仍然能够捕获轨迹的波动,并在与用于生成合成数据的已知值进行基准测试时推断基本的潜在速率参数。重要的是,该模型还产生了一个有效的反馈参数,可用于量化这些电路中的相互作用。这些应用表明了 MaxCal 以有限数据表征电路的能力的前景,以及它有助于更好地理解进化并推进特定功能的设计策略的实用性。