Institute for Evolution and Biodiversity, University of Muenster, Huefferstrasse 1, 48149 Muenster, Germany.
Genome Biol Evol. 2024 Aug 5;16(8). doi: 10.1093/gbe/evae176.
During de novo emergence, new protein coding genes emerge from previously nongenic sequences. The de novo proteins they encode are dissimilar in composition and predicted biochemical properties to conserved proteins. However, functional de novo proteins indeed exist. Both identification of functional de novo proteins and their structural characterization are experimentally laborious. To identify functional and structured de novo proteins in silico, we applied recently developed machine learning based tools and found that most de novo proteins are indeed different from conserved proteins both in their structure and sequence. However, some de novo proteins are predicted to adopt known protein folds, participate in cellular reactions, and to form biomolecular condensates. Apart from broadening our understanding of de novo protein evolution, our study also provides a large set of testable hypotheses for focused experimental studies on structure and function of de novo proteins in Drosophila.
在从头出现中,新的蛋白质编码基因从以前的非基因序列中出现。它们编码的新蛋白质在组成和预测的生化性质上与保守蛋白质不同。然而,功能性的从头蛋白质确实存在。功能性从头蛋白质的鉴定及其结构特征的描述在实验上都是费力的。为了在计算机上鉴定功能性和结构的从头蛋白质,我们应用了最近开发的基于机器学习的工具,发现大多数从头蛋白质在结构和序列上确实与保守蛋白质不同。然而,一些从头蛋白质被预测采用已知的蛋白质折叠,参与细胞反应,并形成生物分子凝聚物。除了拓宽我们对从头蛋白质进化的理解外,我们的研究还为在果蝇中进行从头蛋白质结构和功能的集中实验研究提供了一组大量可检验的假说。