CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, P.R. China.
Evol Bioinform Online. 2012;8:301-19. doi: 10.4137/EBO.S9758. Epub 2012 Jun 19.
Repetitive sequences (RSs) are redundant, complex at times, and often lineage-specific, representing significant "building" materials for genes and genomes. According to their origins, sequence characteristics, and ways of propagation, repetitive sequences are divided into transposable elements (TEs) and satellite sequences (SSs) as well as related subfamilies and subgroups hierarchically. The combined changes attributable to the repetitive sequences alter gene and genome architectures, such as the expansion of exonic, intronic, and intergenic sequences, and most of them propagate in a seemingly random fashion and contribute very significantly to the entire mutation spectrum of mammalian genomes.
Our analysis is focused on evolutional features of TEs and SSs in the intronic sequence of twelve selected mammalian genomes. We divided them into four groups-primates, large mammals, rodents, and primary mammals-and used four non-mammalian vertebrate species as the out-group. After classifying intron size variation in an intron-centric way based on RS-dominance (TE-dominant or SS-dominant intron expansions), we observed several distinct profiles in intron length and positioning in different vertebrate lineages, such as retrotransposon-dominance in mammals and DNA transposon-dominance in the lower vertebrates, amphibians and fishes. The RS patterns of mouse and rat genes are most striking, which are not only distinct from those of other mammals but also different from that of the third rodent species analyzed in this study-guinea pig. Looking into the biological functions of relevant genes, we observed a two-dimensional divergence; in particular, genes that possess SS-dominant and/or RS-free introns are enriched in tissue-specific development and transcription regulation in all mammalian lineages. In addition, we found that the tendency of transposons in increasing intron size is much stronger than that of satellites, and the combined effect of both RSs is greater than either one of them alone in a simple arithmetic sum among the mammals and the opposite is found among the four non-mammalian vertebrates.
TE- and SS-derived RSs represent major mutational forces shaping the size and composition of vertebrate genes and genomes, and through natural selection they either fine-tune or facilitate changes in size expansion, position variation, and duplication, and thus in functions and evolutionary paths for better survival and fitness. When analyzed globally, not only are such changes significantly diversified but also comprehensible in lineages and biological implications.
重复序列(RS)是冗余的,有时很复杂,并且通常是谱系特异性的,它们是基因和基因组的重要“构建”材料。根据其起源、序列特征和传播方式,重复序列分为转座元件(TE)和卫星序列(SS)以及相关的亚家族和亚群。重复序列的综合变化改变了基因和基因组的结构,例如外显子、内含子和基因间序列的扩展,其中大多数以看似随机的方式传播,并对哺乳动物基因组的整个突变谱做出了重要贡献。
我们的分析集中在十二种选定哺乳动物基因组的内含子序列中转座元件和卫星序列的进化特征上。我们将它们分为四组——灵长类动物、大型哺乳动物、啮齿动物和原哺乳动物,并将四种非哺乳动物脊椎动物作为外群。在基于 RS 主导性(TE 主导或 SS 主导的内含子扩展)以内含子为中心的方式对内含子大小变化进行分类后,我们观察到不同脊椎动物谱系中内含子长度和位置的几个不同特征,例如哺乳动物中的逆转录转座子主导和较低等脊椎动物、两栖动物和鱼类中的 DNA 转座子主导。小鼠和大鼠基因的 RS 模式最为引人注目,它们不仅与其他哺乳动物明显不同,而且与本研究中分析的第三种啮齿动物——豚鼠也不同。研究相关基因的生物学功能时,我们观察到了二维分化;特别是,具有 SS 主导和/或 RS 自由内含子的基因在所有哺乳动物谱系中都富含组织特异性发育和转录调控。此外,我们发现转座子增加内含子大小的趋势比卫星强得多,并且在哺乳动物中,RS 对内含子大小的综合影响大于两者单独相加的简单算术和,而在四种非哺乳动物脊椎动物中则相反。
TE 和 SS 衍生的 RS 是塑造脊椎动物基因和基因组大小和组成的主要突变力量,通过自然选择,它们要么微调,要么促进大小扩展、位置变化和复制的变化,从而改变功能和进化途径,以更好地生存和适应。从全局分析,这些变化不仅显著多样化,而且在谱系和生物学意义上也可以理解。