Warburton Peter E, Hasson Dan, Guillem Flavia, Lescale Chloe, Jin Xiaoping, Abrusan Gyorgy
Deptartment of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY 10029, USA.
BMC Genomics. 2008 Nov 7;9:533. doi: 10.1186/1471-2164-9-533.
Tandemly Repeated DNA represents a large portion of the human genome, and accounts for a significant amount of copy number variation. Here we present a genome wide analysis of the largest tandem repeats found in the human genome sequence.
Using Tandem Repeats Finder (TRF), tandem repeat arrays greater than 10 kb in total size were identified, and classified into simple sequence e.g. GAATG, classical satellites e.g. alpha satellite DNA, and locus specific VNTR arrays. Analysis of these large sequenced regions revealed that several "simple sequence" arrays actually showed complex domain and/or higher order repeat organization. Using additional methods, we further identified a total of 96 additional arrays with tandem repeat units greater than 2 kb (the detection limit of TRF), 53 of which contained genes or repeated exons. The overall size of an array of tandem 12 kb repeats which spanned a gap on chromosome 8 was found to be 600 kb to 1.7 Mbp in size, representing one of the largest non-centromeric arrays characterized. Several novel megasatellite tandem DNA families were observed that are characterized by repeating patterns of interspersed transposable elements that have expanded presumably by unequal crossing over. One of these families is found on 11 different chromosomes in >25 arrays, and represents one of the largest most widespread megasatellite DNA families.
This study represents the most comprehensive genome wide analysis of large tandem repeats in the human genome, and will serve as an important resource towards understanding the organization and copy number variation of these complex DNA families.
串联重复DNA占人类基因组的很大一部分,并导致了大量的拷贝数变异。在此,我们对人类基因组序列中发现的最大串联重复序列进行了全基因组分析。
使用串联重复序列查找器(TRF),鉴定出总大小大于10 kb的串联重复阵列,并将其分为简单序列(如GAATG)、经典卫星序列(如α卫星DNA)和基因座特异性VNTR阵列。对这些大的测序区域的分析表明,一些“简单序列”阵列实际上显示出复杂的结构域和/或更高阶的重复组织。使用其他方法,我们进一步鉴定出总共96个串联重复单元大于2 kb(TRF的检测限)的额外阵列,其中53个包含基因或重复外显子。发现一个跨越8号染色体上一个间隙的12 kb串联重复阵列的总大小为600 kb至1.7 Mbp,代表了已表征的最大非着丝粒阵列之一。观察到几个新的大卫星串联DNA家族,其特征是散布的转座元件的重复模式,这些元件可能通过不等交换而扩展。其中一个家族在11条不同的染色体上的>25个阵列中被发现,代表了最大且分布最广的大卫星DNA家族之一。
本研究代表了对人类基因组中大型串联重复序列最全面的全基因组分析,并将作为理解这些复杂DNA家族的组织和拷贝数变异的重要资源。