University of Copenhagen, Copenhagen, Denmark.
Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.
Semin Oncol. 2019 Feb;46(1):3-9. doi: 10.1053/j.seminoncol.2018.11.002. Epub 2018 Dec 8.
Following Liebeskind et al [1], we have attempted to find consensus ages for the protein-coding and the noncoding genes of the human genome, using publicly-available ortholog databases. For each database separately, we determined its age estimate for the genes it listed, determining this by identifying the earliest ortholog for the gene in question. We assigned these ages to 1 of the 19 major phylostrata defined by Domazet-Loso and Tautz [2], 2 of which were further subdivided. From these various estimates, we found the modal value if 1 was present, defining this as the consensus age for the gene. For the genes where no consensus value could be found, we recorded the median value of the age estimates across the databases interrogated. We present a resource that lists the age, as so defined, of every one of the 19,660 protein-coding genes and of 5,981 of the 16,528 non-protein-coding genes of the human genome, the age being the time when the gene was accreted to the evolving human genome. We calculate the number of genes that accreted to the genome, epoch by epoch, and consider the rate at which they accreted.
根据 Liebeskind 等人的研究[1],我们试图通过使用公开的直系同源数据库,为人类基因组的编码蛋白和非编码基因找到共识年龄。对于每个单独的数据库,我们通过确定所研究基因的最早直系同源物来确定其列出的基因的年龄估计值。我们将这些年龄分配给 Domazet-Loso 和 Tautz [2]定义的 19 个主要进化枝中的 1 个,其中 2 个进一步细分。从这些各种估计中,如果存在 1 个,我们找到了模态值,将其定义为基因的共识年龄。对于无法找到共识值的基因,我们记录了在询问的数据库中年龄估计值的中位数。我们提供了一个资源,列出了人类基因组中 19660 个编码蛋白基因和 5981 个非编码蛋白基因中的每一个的年龄,该年龄定义为基因被添加到不断进化的人类基因组中的时间。我们计算了每个时期添加到基因组中的基因数量,并考虑了它们添加的速度。