Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany;
Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany.
Genome Res. 2023 Jun;33(6):872-890. doi: 10.1101/gr.277482.122. Epub 2023 Jul 13.
Novel genes are essential for evolutionary innovations and differ substantially even between closely related species. Recently, multiple studies across many taxa showed that some novel genes arise de novo, that is, from previously noncoding DNA. To characterize the underlying mutations that allowed de novo gene emergence and their order of occurrence, homologous regions must be detected within noncoding sequences in closely related sister genomes. So far, most studies do not detect noncoding homologs of de novo genes because of incomplete assemblies and annotations, and long evolutionary distances separating genomes. Here, we overcome these issues by searching for de novo expressed open reading frames (neORFs), the not-yet fixed precursors of de novo genes that emerged within a single species. We sequenced and assembled genomes with long-read technology and the corresponding transcriptomes from inbred lines of , derived from seven geographically diverse populations. We found line-specific neORFs in abundance but few neORFs shared by lines, suggesting a rapid turnover. Gain and loss of transcription is more frequent than the creation of ORFs, for example, by forming new start and stop codons. Consequently, the gain of ORFs becomes rate limiting and is frequently the initial step in neORFs emergence. Furthermore, transposable elements (TEs) are major drivers for intragenomic duplications of neORFs, yet TE insertions are less important for the emergence of neORFs. However, highly mutable genomic regions around TEs provide new features that enable gene birth. In conclusion, neORFs have a high birth-death rate, are rapidly purged, but surviving neORFs spread neutrally through populations and within genomes.
新基因对于进化创新至关重要,即使在亲缘关系密切的物种之间也存在很大差异。最近,许多分类群的多项研究表明,一些新基因是从头产生的,也就是说,它们来自以前没有编码功能的 DNA。为了描述允许从头产生新基因的潜在突变及其发生顺序,必须在亲缘关系密切的姐妹基因组的非编码序列中检测到同源区域。到目前为止,由于不完全的组装和注释以及将基因组分隔开来的漫长进化距离,大多数研究都无法检测到新基因的非编码同源物。在这里,我们通过搜索新表达的开放阅读框(neORF)来克服这些问题,neORF 是在一个物种内产生的新基因的尚未固定的前体。我们使用长读长技术对来自七个地理上不同种群的近交系进行了测序和组装基因组,并对其进行了转录组测序。我们发现了大量的线特异性 neORF,但很少有由线共享的 neORF,这表明它们的周转率很高。转录的获得和丢失比形成 ORF 更为频繁,例如,通过形成新的起始和终止密码子。因此,ORF 的获得成为限制因素,并且通常是 neORF 出现的初始步骤。此外,转座元件(TE)是基因组内 neORF 重复的主要驱动因素,然而,TE 插入对于 neORF 的出现并不重要。然而,TE 周围高度易变的基因组区域提供了新的特征,使基因诞生成为可能。总之,neORF 的产生-消失率很高,很快就被清除了,但存活下来的 neORF 会在种群和基因组内中性传播。