Hoh Celine, Salzberg Steven L
Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA.
Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA.
bioRxiv. 2024 May 4:2024.05.02.592247. doi: 10.1101/2024.05.02.592247.
The rapid growth in the number of sequenced genomes makes it possible to search for the appearance of entirely new introns in the human lineage. In this study, we compared the genomic sequences for 19,120 human protein-coding genes to a collection of 3493 vertebrate genomes, mapping the patterns of intron alignments onto a phylogenetic tree. This mapping allowed us to trace many intron gain events to precise locations in the tree, corresponding to distinct points in evolutionary history. We discovered 584 intron gain events, all of them relatively recent, in 514 distinct human genes. Among these events, we explored the hypothesis that intronization was the mechanism responsible for intron gain. Intronization events were identified by locating instances where human introns correspond to exonic sequences in homologous vertebrate genes. Although apparently rare, we found three compelling cases of intronization, and for each of those we compared the human protein sequence and structure to homologous genes that lack the introns.
已测序基因组数量的快速增长使得在人类谱系中寻找全新内含子的出现成为可能。在本研究中,我们将19120个人类蛋白质编码基因的基因组序列与3493个脊椎动物基因组的集合进行了比较,将内含子比对模式映射到系统发育树上。这种映射使我们能够将许多内含子获得事件追溯到树中的精确位置,这些位置对应于进化历史中的不同点。我们在514个不同的人类基因中发现了584个内含子获得事件,所有这些事件都相对较新。在这些事件中,我们探讨了内含子化是导致内含子获得的机制这一假设。通过定位人类内含子与同源脊椎动物基因外显子序列相对应的实例来识别内含子化事件。尽管显然很罕见,但我们发现了三个令人信服的内含子化案例,并且对于其中每一个案例,我们都将人类蛋白质序列和结构与缺乏该内含子的同源基因进行了比较。