Posfai Anna, Zhou Juannan, McCandlish David M, Kinney Justin B
Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724.
Department of Biology, University of Florida, Gainesville, FL, 32611.
bioRxiv. 2024 Jun 24:2024.05.12.593772. doi: 10.1101/2024.05.12.593772.
Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.
序列-功能关系的定量模型在计算生物学中无处不在,例如用于模拟转录因子的DNA结合或蛋白质的适应度景观。然而,由于模型参数的值通常可以在不影响模型预测的情况下改变,这使得对这些模型的解释变得复杂。在能够有意义地解释模型参数的值之前,必须通过施加额外的约束(在物理学中称为“规范固定”的过程)来消除这些自由度(在物理学中称为“规范自由度”)。然而,用于固定序列-功能关系规范的策略很少受到关注。在这里,我们为一大类序列-功能关系推导了一族解析上易于处理的规范。这些规范是在具有全阶相互作用的模型背景下推导出来的,但这些规范的一个重要子集可以应用于各种类型的模型,包括加性模型、成对相互作用模型和具有高阶相互作用的模型。许多常用的规范都是这个族内规范的特殊情况。我们通过展示不同的规范选择如何既能用于探索复杂的活性景观,又能揭示在序列空间局部区域内近似正确的简化模型,来证明这一族规范的实用性。结果提供了实用的规范固定策略,并证明了规范固定在模型探索和解释中的实用性。