Department of Integrative Evolutionary Biology, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 9, 72076 Tübingen, Germany.
Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, August Thienemann Str. 2, 24306 Plön, Germany.
G3 (Bethesda). 2019 Jul 9;9(7):2277-2286. doi: 10.1534/g3.119.400326.
Homology is a fundamental concept in comparative biology. It is extensively used at the sequence level to make phylogenetic hypotheses and functional inferences. Nonetheless, the majority of eukaryotic genomes contain large numbers of orphan genes lacking homologs in other taxa. Generally, the fraction of orphan genes is higher in genomically undersampled clades, and in the absence of closely related genomes any hypothesis about their origin and evolution remains untestable. Previously, we sequenced ten genomes with an underlying ladder-like phylogeny to establish a phylogenomic framework for studying genome evolution in diplogastrid nematodes. Here, we use this deeply sampled data set to understand the processes that generate orphan genes in our focal species Based on phylostratigraphic analysis and additional bioinformatic filters, we obtained 29 high-confidence candidate genes for which mechanisms of orphan origin were proposed based on manual inspection. This revealed diverse mechanisms including annotation artifacts, chimeric origin, alternative reading frame usage, and gene splitting with subsequent gain of exons. In addition, we present two cases of complete origination from non-coding regions, which represents one of the first reports of genes in nematodes. Thus, we conclude that emergence, divergence, and mixed mechanisms contribute to novel gene formation in nematodes.
同源性是比较生物学中的一个基本概念。它在序列水平上被广泛用于构建系统发育假说和进行功能推断。然而,大多数真核生物基因组包含大量没有其他分类单元同源物的孤儿基因。一般来说,在基因组采样不足的进化枝中,孤儿基因的比例更高,而且在没有密切相关的基因组的情况下,关于它们起源和进化的任何假设都无法进行检验。此前,我们对具有梯状系统发育的十个基因组进行了测序,以建立研究双胃线虫基因组进化的系统基因组学框架。在这里,我们使用这个深度采样数据集来了解在我们的研究重点物种中产生孤儿基因的过程。基于系统发生分析和其他生物信息学筛选,我们获得了 29 个高可信度的候选基因,这些基因的孤儿起源机制是基于手动检查提出的。这揭示了多种机制,包括注释伪影、嵌合起源、交替阅读框使用以及随后获得外显子的基因分裂。此外,我们还介绍了两个来自非编码区的完整起源的案例,这代表了线虫中首次报道的基因之一。因此,我们得出结论,新基因的形成是由基因的起源、分化和混合机制共同作用的。