Rogozin Igor B, Sverdlov Alexander V, Babenko Vladimir N, Koonin Eugene V
National Center for Biotechnology Information/NLM/NIH, 8600 Rockville Pike, Bldg. 38A, Bethesda, MD 20894, USA.
Brief Bioinform. 2005 Jun;6(2):118-34. doi: 10.1093/bib/6.2.118.
The availability of multiple, complete eukaryotic genome sequences allows one to address many fundamental evolutionary questions on genome scale. One such important, long-standing problem is evolution of exon-intron structure of eukaryotic genes. Analysis of orthologous genes from completely sequenced genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists. The data on shared and lineage-specific intron positions were used as the starting point for evolutionary reconstruction with parsimony and maximum-likelihood approaches. Parsimony methods produce reconstructions with intron-rich ancestors but also infer lineage-specific, in many cases, high levels of intron loss and gain. Different probabilistic models gave opposite results, apparently depending on model parameters and assumptions, from domination of intron loss, with extremely intron-rich ancestors, to dramatic excess of gains, to the point of denying any true conservation of intron positions among deep eukaryotic lineages. Development of models with adequate, realistic parameters and assumptions seems to be crucial for obtaining more definitive estimates of intron gain and loss in different eukaryotic lineages. Many shared intron positions were detected in ancestral eukaryotic paralogues which evolved by duplication prior to the divergence of extant eukaryotic lineages. These findings indicate that numerous introns were present in eukaryotic genes already at the earliest stages of evolution of eukaryotes and are compatible with the hypothesis that the original, catastrophic intron invasion accompanied the emergence of the eukaryotic cells. Comparison of various features of old and younger introns starts shedding light on probable mechanisms of intron insertion, indicating that propagation of old introns is unlikely to be a major mechanism for origin of new ones. The existence and structure of ancestral protosplice sites were addressed by examining the context of introns inserted within codons that encode amino acids conserved in all eukaryotes and, accordingly, are not subject to selection for splicing efficiency. It was shown that introns indeed predominantly insert into or are fixed in specific protosplice sites which have the consensus sequence (A/C)AG|Gt.
多个完整的真核生物基因组序列的可得性使得人们能够在基因组规模上解决许多基本的进化问题。一个这样重要且长期存在的问题是真核生物基因外显子 - 内含子结构的进化。对来自完全测序基因组的直系同源基因的分析揭示了动物和植物的直系同源基因中存在许多共享的内含子位置,甚至在动物、植物和原生生物之间也有。关于共享和特定谱系内含子位置的数据被用作采用简约法和最大似然法进行进化重建的起点。简约法产生的重建结果是祖先富含内含子,但也推断出特定谱系在许多情况下有高水平的内含子丢失和获得。不同的概率模型给出了相反的结果,显然这取决于模型参数和假设,从以富含内含子的祖先为主导的内含子丢失,到显著的获得过量,甚至到否定在深层真核生物谱系中内含子位置存在任何真正保守性的程度。开发具有适当、现实参数和假设的模型对于获得不同真核生物谱系中内含子获得和丢失的更明确估计似乎至关重要。在现存真核生物谱系分化之前通过复制进化而来的祖先真核生物旁系同源物中检测到许多共享的内含子位置。这些发现表明,在真核生物进化的最早阶段,真核生物基因中就已经存在大量内含子,并且与原始的灾难性内含子入侵伴随着真核细胞出现的假说相符。对新旧内含子各种特征的比较开始揭示内含子插入的可能机制,这表明旧内含子的传播不太可能是新内含子起源的主要机制。通过检查插入在编码所有真核生物中保守氨基酸的密码子内的内含子的上下文来探讨祖先原剪接位点的存在和结构,因此这些密码子不受剪接效率选择的影响。结果表明,内含子确实主要插入到或固定在具有一致序列(A/C)AG|Gt的特定原剪接位点中。