Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto 606-8501, Japan.
BMC Genomics. 2011 Jan 19;12:45. doi: 10.1186/1471-2164-12-45.
The basic process of RNA splicing is conserved among eukaryotic species. Three signals (5' and 3' splice sites and branch site) are commonly used to directly conduct splicing, while other features are also related to the recognition of an intron. Although there is experimental evidence pointing to the significant species specificities in the features of intron recognition, a quantitative evaluation of the divergence of these features among a wide variety of eukaryotes has yet to be conducted.
To better understand the splicing process from the viewpoints of evolution and information theory, we collected introns from 61 diverse species of eukaryotes and analyzed the properties of the nucleotide sequences relevant to splicing. We found that trees individually constructed from the five features (the three signals, intron length, and nucleotide composition within an intron) roughly reflect the phylogenetic relationships among the species but sometimes extensively deviate from the species classification. The degree of topological deviation of each feature tree from the reference trees indicates the lowest discordance for the 5' splicing signal, followed by that for the 3' splicing signal, and a considerably greater discordance for the other three features. We also estimated the relative contributions of the five features to short intron recognition in each species. Again, moderate correlation was observed between the similarities in pattern of short intron recognition and the genealogical relationships among the species. When mammalian introns were categorized into three subtypes according to their terminal dinucleotide sequences, each subtype segregated into a nearly monophyletic group, regardless of the host species, with respect to the 5' and 3' splicing signals. It was also found that GC-AG introns are extraordinarily abundant in some species with high genomic G + C contents, and that the U12-type spliceosome might make a greater contribution than currently estimated in most species.
Overall, the present study indicates that both splicing signals themselves and their relative contributions to short intron recognition are rather susceptible to evolutionary changes, while some poorly characterized properties seem to be preserved within the mammalian intron subtypes. Our findings may afford additional clues to understanding of evolution of splicing mechanisms.
RNA 剪接的基本过程在真核生物中是保守的。三个信号(5' 和 3' 剪接位点和分支位点)通常用于直接进行剪接,而其他特征也与内含子的识别有关。尽管有实验证据表明内含子识别特征在物种间具有显著的特异性,但尚未对广泛的真核生物中这些特征的差异进行定量评估。
为了从进化和信息论的角度更好地理解剪接过程,我们从 61 种不同的真核生物中收集了内含子,并分析了与剪接相关的核苷酸序列特征。我们发现,从五个特征(三个信号、内含子长度和内含子内的核苷酸组成)分别构建的树大致反映了物种间的系统发育关系,但有时与物种分类有很大的偏差。每个特征树与参考树的拓扑偏差程度表明 5' 剪接信号的最低不协调性,其次是 3' 剪接信号,其他三个特征的不协调性则更大。我们还估计了五个特征在每个物种中对短内含子识别的相对贡献。同样,短内含子识别模式的相似性与物种间的系统发育关系之间存在中度相关性。当根据末端二核苷酸序列将哺乳动物内含子分为三个亚型时,每个亚型都与宿主物种无关,根据 5' 和 3' 剪接信号分为一个近单系群。还发现 GC-AG 内含子在一些基因组 GC 含量较高的物种中非常丰富,并且 U12 剪接体在大多数物种中可能比目前估计的贡献更大。
总体而言,本研究表明,剪接信号本身及其对短内含子识别的相对贡献都容易发生进化变化,而一些特征较差的特征似乎在哺乳动物内含子亚型内得到了保留。我们的发现可能为理解剪接机制的进化提供更多线索。