Department of Ecology and Evolution, The University of Chicago, Chicago, IL 60637, USA.
Division of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA.
Genome Biol Evol. 2024 Jun 4;16(6). doi: 10.1093/gbe/evae107.
Recent studies in the rice genome-wide have established that de novo genes, evolving from noncoding sequences, enhance protein diversity through a stepwise process. However, the pattern and rate of their evolution in protein structure over time remain unclear. Here, we addressed these issues within a surprisingly short evolutionary timescale (<1 million years for 97% of Oryza de novo genes) with comparative approaches to gene duplicates. We found that de novo genes evolve faster than gene duplicates in the intrinsically disordered regions (such as random coils), secondary structure elements (such as α helix and β strand), hydrophobicity, and molecular recognition features. In de novo proteins, specifically, we observed an 8% to 14% decay in random coils and intrinsically disordered region lengths and a 2.3% to 6.5% increase in structured elements, hydrophobicity, and molecular recognition features, per million years on average. These patterns of structural evolution align with changes in amino acid composition over time as well. We also revealed higher positive charges but smaller molecular weights for de novo proteins than duplicates. Tertiary structure predictions showed that most de novo proteins, though not typically well folded on their own, readily form low-energy and compact complexes with other proteins facilitated by extensive residue contacts and conformational flexibility, suggesting a faster-binding scenario in de novo proteins to promote interaction. These analyses illuminate a rapid evolution of protein structure in de novo genes in rice genomes, originating from noncoding sequences, highlighting their quick transformation into active, protein complex-forming components within a remarkably short evolutionary timeframe.
近年来,水稻全基因组的研究已经证实,从头基因(从非编码序列进化而来)通过逐步的过程增强了蛋白质的多样性。然而,它们在蛋白质结构上随时间的进化模式和速度仍不清楚。在这里,我们在一个惊人的短进化时间尺度内(97%的水稻从头基因的进化时间小于 100 万年),通过对基因重复的比较方法来解决这些问题。我们发现,从头基因在无规卷曲等固有无序区域、α 螺旋和β 链等二级结构元件、疏水性和分子识别特征等方面的进化速度快于基因重复。在从头蛋白中,我们观察到无规卷曲和固有无序区域长度平均每百万年减少 8%至 14%,而结构元件、疏水性和分子识别特征平均每百万年增加 2.3%至 6.5%。这些结构进化模式与随时间推移的氨基酸组成变化一致。我们还发现,从头蛋白的正电荷更高,但分子量比重复蛋白小。三级结构预测表明,尽管大多数从头蛋白本身通常不能很好地折叠,但它们通过广泛的残基接触和构象灵活性,很容易与其他蛋白质形成低能量和紧凑的复合物,这表明在从头蛋白中存在更快的结合情况,以促进相互作用。这些分析阐明了水稻基因组中非编码序列起源的从头基因中蛋白质结构的快速进化,突出了它们在非常短的进化时间内迅速转化为活跃的、形成蛋白质复合物的成分。