Institut de Systématique, Evolution, Biodiversité (ISYEB), Sorbonne Université, CNRS, Museum National d'Histoire Naturelle, EPHE, Université des Antilles, Paris, France.
Mol Biol Evol. 2022 Jan 7;39(1). doi: 10.1093/molbev/msab329.
All genomes include gene families with very limited taxonomic distributions that potentially represent new genes and innovations in protein-coding sequence, raising questions on the origins of such genes. Some of these genes are hypothesized to have formed de novo, from noncoding sequences, and recent work has begun to elucidate the processes by which de novo gene formation can occur. A special case of de novo gene formation, overprinting, describes the origin of new genes from noncoding alternative reading frames of existing open reading frames (ORFs). We argue that additionally, out-of-frame gene fission/fusion events of alternative reading frames of ORFs and out-of-frame lateral gene transfers could contribute to the origin of new gene families. To demonstrate this, we developed an original pattern-search in sequence similarity networks, enhancing the use of these graphs, commonly used to detect in-frame remodeled genes. We applied this approach to gene families in 524 complete genomes of Escherichia coli. We identified 767 gene families whose evolutionary history likely included at least one out-of-frame remodeling event. These genes with out-of-frame components represent ∼2.5% of all genes in the E. coli pangenome, suggesting that alternative reading frames of existing ORFs can contribute to a significant proportion of de novo genes in bacteria.
所有基因组都包含具有非常有限的分类分布的基因家族,这些基因家族可能代表了蛋白质编码序列中的新基因和创新,这引发了关于这些基因起源的问题。这些基因中的一些被假设是从头形成的,来自非编码序列,最近的研究已经开始阐明从头形成基因的过程。从头形成基因的一个特殊情况是重叠,它描述了新基因从现有开放阅读框(ORF)的非编码替代阅读框起源。我们认为,此外,ORF 的替代阅读框的无框基因断裂/融合事件和无框侧向基因转移也可能有助于新基因家族的起源。为了证明这一点,我们在序列相似性网络中开发了一种原始的模式搜索,增强了这些图的使用,这些图通常用于检测框内重构基因。我们将这种方法应用于 524 个完整大肠杆菌基因组中的基因家族。我们确定了 767 个基因家族,其进化历史可能至少包括一个无框重塑事件。这些具有无框成分的基因代表大肠杆菌泛基因组中所有基因的约 2.5%,这表明现有 ORF 的替代阅读框可能有助于细菌中相当一部分的新基因。