Department of Biology, Temple University, 1900 North 12th Street, Philadelphia, PA, USA.
BMC Ecol Evol. 2021 Mar 10;21(1):39. doi: 10.1186/s12862-021-01770-4.
Recovering the historical patterns of selection acting on a protein coding sequence is a major goal of evolutionary biology. Mutation-selection models address this problem by explicitly modelling fixation rates as a function of site-specific amino acid fitness values.However, they are restricted in their utility for investigating directional evolution because they require prior knowledge of the locations of fitness changes in the lineages of a phylogeny.
We apply a modified mutation-selection methodology that relaxes assumptions of equlibrium and time-reversibility. Our implementation allows us to identify branches where adaptive or compensatory shifts in the fitness landscape have taken place, signalled by a change in amino acid fitness profiles. Through simulation and analysis of an empirical data set of [Formula: see text]-lactamase genes, we test our ability to recover the position of adaptive events within the tree and successfully reconstruct initial codon frequencies and fitness profile parameters generated under the non-stationary model.
We demonstrate successful detection of selective shifts and identification of the affected branch on partitions of 300 codons or more. We successfully reconstruct fitness parameters and initial codon frequencies in simulated data and demonstrate that failing to account for non-equilibrium evolution can increase the error in fitness profile estimation. We also demonstrate reconstruction of plausible shifts in amino acid fitnesses in the bacterial [Formula: see text]-lactamase family and discuss some caveats for interpretation.
恢复蛋白质编码序列中选择作用的历史模式是进化生物学的主要目标。突变-选择模型通过明确地将固定速率建模为特定位置的氨基酸适应值的函数来解决这个问题。然而,由于它们需要在系统发育的谱系中预先了解适应度变化的位置,因此它们在研究定向进化方面的实用性受到限制。
我们应用了一种改进的突变-选择方法,该方法放宽了平衡和时间可逆性的假设。我们的实现允许我们识别适应性或补偿性适应度景观变化发生的分支,这由氨基酸适应值分布的变化来表示。通过对[Formula: see text]-内酰胺酶基因的经验数据集进行模拟和分析,我们测试了我们在树内恢复适应事件位置的能力,并成功重建了在非稳定模型下生成的初始密码子频率和适应度分布参数。
我们成功地检测到了选择性变化,并在 300 个密码子或更多的分区上识别出了受影响的分支。我们成功地在模拟数据中重建了适应度参数和初始密码子频率,并表明不考虑非平衡进化会增加适应度分布参数估计的误差。我们还成功地重建了细菌[Formula: see text]-内酰胺酶家族中氨基酸适应度的合理变化,并讨论了一些解释上的注意事项。