Département de biologie, École normale supérieure, Institut de Biologie de l'ENS (IBENS), CNRS, INSERM, Paris, France.
Laboratoire de Biologie du Développement, Sorbonne Université, CNRS, Institut de Biologie Paris-Seine (IBPS), Paris, France.
Mol Biol Evol. 2023 Mar 4;40(3). doi: 10.1093/molbev/msad042.
Amino acids evolve at different speeds within protein sequences, because their functional and structural roles are different. Notably, amino acids located at the surface of proteins are known to evolve more rapidly than those in the core. In particular, amino acids at the N- and C-termini of protein sequences are likely to be more exposed than those at the core of the folded protein due to their location in the peptidic chain, and they are known to be less structured. Because of these reasons, we would expect that amino acids located at protein termini would evolve faster than residues located inside the chain. Here we test this hypothesis and found that amino acids evolve almost twice as fast at protein termini compared with those in the center, hinting at a strong topological bias along the sequence length. We further show that the distribution of solvent-accessible residues and functional domains in proteins readily explain how structural and functional constraints are weaker at their termini, leading to the observed excess of amino acid substitutions. Finally, we show that the specific evolutionary rates at protein termini may have direct consequences, notably misleading in silico methods used to infer sites under positive selection within genes. These results suggest that accounting for positional information should improve evolutionary models.
氨基酸在蛋白质序列中的进化速度不同,因为它们的功能和结构作用不同。值得注意的是,已知蛋白质表面的氨基酸比核心的氨基酸进化得更快。特别是,由于它们在肽链中的位置,蛋白质序列的 N- 和 C-末端的氨基酸比折叠蛋白质的核心氨基酸更容易暴露,并且已知它们的结构较少。由于这些原因,我们预计位于蛋白质末端的氨基酸比位于链内的残基进化得更快。在这里,我们检验了这一假设,发现与位于蛋白质中心的氨基酸相比,蛋白质末端的氨基酸进化速度几乎快了两倍,这表明沿序列长度存在强烈的拓扑偏向。我们进一步表明,蛋白质中溶剂可及残基和功能域的分布很容易解释为什么它们的末端结构和功能约束较弱,导致观察到的氨基酸替换过多。最后,我们表明蛋白质末端的特定进化速度可能会产生直接的后果,尤其是在用于推断基因中阳性选择位点的计算机模拟方法中。这些结果表明,考虑位置信息应该可以改进进化模型。