Adelson D L, Raison J M, Garber M, Edgar R C
School of Molecular and Biomedical Science, University of Adelaide, North Terrace, Adelaide, South Australia, Australia.
Anim Genet. 2010 Dec;41 Suppl 2:91-9. doi: 10.1111/j.1365-2052.2010.02115.x.
The interspersed repeat content of mammalian genomes has been best characterized in human, mouse and cow. In this study, we carried out de novo identification of repeated elements in the equine genome and identified previously unknown elements present at low copy number. The equine genome contains typical eutherian mammal repeats, but also has a significant number of hybrid repeats in addition to clade-specific Long Interspersed Nuclear Elements (LINE). Equus caballus clade specific LINE 1 (L1) repeats can be classified into approximately five subfamilies, three of which have undergone significant expansion. There are 1115 full-length copies of these equine L1, but of the 103 presumptive active copies, 93 fall within a single subfamily, indicating a rapid recent expansion of this subfamily. We also analysed both interspersed and simple sequence repeats (SSR) genome-wide, finding that some repeat classes are spatially correlated with each other as well as with G+C content and gene density. Based on these spatial correlations, we have confirmed that recently-described ancestral vs. clade-specific genome territories can be defined by their repeat content. The clade-specific Short Interspersed Nuclear Element correlations were scattered over the genome and appear to have been extensively remodelled. In contrast, territories enriched for ancestral repeats tended to be contiguous domains. To determine if the latter territories were evolutionarily conserved, we compared these results with a similar analysis of the human genome, and observed similar ancestral repeat enriched domains. These results indicate that ancestral, evolutionarily conserved mammalian genome territories can be identified on the basis of repeat content alone. Interspersed repeats of different ages appear to be analogous to geologic strata, allowing identification of ancient vs. newly remodelled regions of mammalian genomes.
哺乳动物基因组中的散在重复序列在人类、小鼠和牛中得到了最好的表征。在本研究中,我们对马基因组中的重复元件进行了从头鉴定,并鉴定出了低拷贝数的先前未知元件。马基因组包含典型的真兽亚纲哺乳动物重复序列,但除了特定进化枝的长散在核元件(LINE)外,还具有大量的杂交重复序列。马属动物特定进化枝的LINE 1(L1)重复序列可分为大约五个亚家族,其中三个亚家族经历了显著的扩增。这些马L1有1115个全长拷贝,但在103个推定的活性拷贝中,93个属于单个亚家族,这表明该亚家族最近迅速扩增。我们还对全基因组的散在重复序列和简单序列重复序列(SSR)进行了分析,发现一些重复序列类别在空间上彼此相关,并且与G+C含量和基因密度相关。基于这些空间相关性,我们证实了最近描述的祖先与特定进化枝的基因组区域可以通过它们的重复序列含量来定义。特定进化枝的短散在核元件相关性分散在基因组中,并且似乎已经被广泛重塑。相比之下,富含祖先重复序列的区域倾向于形成连续的结构域。为了确定后者区域是否在进化上保守,我们将这些结果与人类基因组的类似分析进行了比较,并观察到了类似的富含祖先重复序列的结构域。这些结果表明,仅根据重复序列含量就可以识别出祖先的、在进化上保守的哺乳动物基因组区域。不同年代的散在重复序列似乎类似于地质地层,从而能够识别哺乳动物基因组的古老区域和新重塑区域。