Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna A-1030, Austria.
Department of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna A-1030, Austria
Proc Natl Acad Sci U S A. 2020 Mar 17;117(11):5907-5912. doi: 10.1073/pnas.1911203117. Epub 2020 Mar 3.
Frameshifts in protein coding sequences are widely perceived as resulting in either nonfunctional or even deleterious protein products. Indeed, frameshifts typically lead to markedly altered protein sequences and premature stop codons. By analyzing complete proteomes from all three domains of life, we demonstrate that, in contrast, several key physicochemical properties of protein sequences exhibit significant robustness against +1 and -1 frameshifts. In particular, we show that hydrophobicity profiles of many protein sequences remain largely invariant upon frameshifting. For example, over 2,900 human proteins exhibit a Pearson's correlation coefficient R between the hydrophobicity profiles of the original and the +1-frameshifted variants greater than 0.7, despite an average sequence identity between the two of only 6.5% in this group. We observe a similar effect for protein sequence profiles of affinity for certain nucleobases as well as protein sequence profiles of intrinsic disorder. Finally, analysis of significance and optimality demonstrates that frameshift stability is embedded in the structure of the universal genetic code and may have contributed to shaping it. Our results suggest that frameshifting may be a powerful evolutionary mechanism for creating new proteins with vastly different sequences, yet similar physicochemical properties to the proteins from which they originate.
蛋白质编码序列中的移码突变通常被认为会导致无功能甚至有害的蛋白质产物。事实上,移码突变通常会导致明显改变的蛋白质序列和过早的终止密码子。通过分析来自生命三个领域的完整蛋白质组,我们证明,与普遍看法相反,蛋白质序列的几个关键物理化学性质对+1 和-1 移码具有显著的稳健性。具体来说,我们表明,许多蛋白质序列的疏水性图谱在移码后基本保持不变。例如,超过 2900 个人类蛋白质的原始和+1 移码变体的疏水性图谱之间的 Pearson 相关系数 R 大于 0.7,尽管在这一组中,两者之间的平均序列同一性仅为 6.5%。我们在某些核碱基亲和力的蛋白质序列图谱以及蛋白质序列无序性图谱中观察到类似的效应。最后,对显著性和最优性的分析表明,移码稳定性嵌入在通用遗传密码的结构中,可能对其形成有贡献。我们的结果表明,移码可能是一种强大的进化机制,可以产生具有截然不同序列但与起源蛋白具有相似物理化学性质的新蛋白质。