Basile Walter, Sachenkova Oxana, Light Sara, Elofsson Arne
Science for Life Laboratory, Stockholm University, Solna, Sweden.
Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden.
PLoS Comput Biol. 2017 Mar 29;13(3):e1005375. doi: 10.1371/journal.pcbi.1005375. eCollection 2017 Mar.
De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population. These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.
从头创建蛋白质编码基因涉及从非编码区域形成短开放阅读框;其中一些开放阅读框可能随后在种群中固定下来。这些孤儿蛋白至少不能对生物体造成严重损害,这意味着它们例如不应聚集。因此,尽管短开放阅读框的产生可能是真正随机的,但固定过程应受到某种选择压力。作用于孤儿蛋白的选择力一直难以捉摸,并且已经报道了相互矛盾的结果。在果蝇中,年轻蛋白质比古老蛋白质更无序,而在酵母中则呈现相反的趋势。据我们所知,尚未对这种差异提出有效的解释。为了解决这个谜团,我们研究了187种真核生物中蛋白质的结构特性和年龄。我们发现,除了长度之外,不同年龄蛋白质的特性只有很小的差异。然而,当我们考虑GC含量时,我们注意到它可以解释在酵母(低GC)和果蝇(高GC)中观察到的孤儿蛋白的相反趋势。GC含量与编码促进无序氨基酸的密码子相关。这使我们提出,内在无序不是孤儿蛋白固定的强决定因素。相反,在特定的GC水平下,这些蛋白质在很大程度上类似于随机蛋白质。在进化过程中,蛋白质的特性变化比GC水平快,导致无序与GC之间的关系逐渐减弱。