Smurfit Institute of Genetics, Trinity College Dublin, University of Dublin, Dublin, Ireland.
Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, United States.
Elife. 2020 Feb 18;9:e53500. doi: 10.7554/eLife.53500.
The origin of 'orphan' genes, species-specific sequences that lack detectable homologues, has remained mysterious since the dawn of the genomic era. There are two dominant explanations for orphan genes: complete sequence divergence from ancestral genes, such that homologues are not readily detectable; and de novo emergence from ancestral non-genic sequences, such that homologues genuinely do not exist. The relative contribution of the two processes remains unknown. Here, we harness the special circumstance of conserved synteny to estimate the contribution of complete divergence to the pool of orphan genes. By separately comparing yeast, fly and human genes to related taxa using conservative criteria, we find that complete divergence accounts, on average, for at most a third of eukaryotic orphan and taxonomically restricted genes. We observe that complete divergence occurs at a stable rate within a phylum but at different rates between phyla, and is frequently associated with gene shortening akin to pseudogenization.
“孤儿”基因的起源一直是个谜,这些基因是物种特异性的序列,缺乏可检测的同源物。自从基因组时代开始以来,对于孤儿基因有两种主要的解释:完全从祖先基因中分化出来,使得同源物不易被检测到;以及从头从祖先的非基因序列中出现,使得实际上不存在同源物。这两个过程的相对贡献仍然未知。在这里,我们利用保守同线性的特殊情况来估计完全分化对孤儿基因库的贡献。通过使用保守标准分别将酵母、果蝇和人类基因与相关分类群进行比较,我们发现完全分化平均最多只能解释三分之一的真核孤儿基因和分类群限制基因。我们观察到,在一个门内,完全分化以稳定的速率发生,但在门之间的速率不同,并且经常与基因缩短相关,类似于假基因化。