Subramanian Subbaya, Mishra Rakesh K, Singh Lalji
Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad 500 007, India.
Genome Biol. 2003;4(2):R13. doi: 10.1186/gb-2003-4-2-r13. Epub 2003 Jan 23.
Simple sequence repeats (SSRs) are found in most organisms, and occupy about 3% of the human genome. Although it is becoming clear that such repeats are important in genomic organization and function and may be associated with disease conditions, their systematic analysis has not been reported. This is the first report examining the distribution and density of simple sequence repeats (1-6 base-pairs (bp)) in the entire human genome.
The densities of SSRs across the human chromosomes were found to be relatively uniform. However, the overall density of SSR was found to be high in chromosome 19. Triplets and hexamers were more predominant in exonic regions compared to intronic and intergenic regions, except for chromosome Y. Comparison of densities of various SSRs revealed that whereas trimers and pentamers showed a similar pattern (500-1,000 bp/Mb) across the chromosomes, di- tetra- and hexa-nucleotide repeats showed patterns of higher (2,000-3,000 bp/Mb) density. Repeats of the same nucleotide were found to be higher than other repeat types. Repeats of A, AT, AC, AAT, AAC, AAG, AGC, AAAC, AAAT, AAAG, AAGG, AGAT predominate, whereas repeats of C, CG, ACT, ACG, AACC, AACG, AACT, AAGC, AAGT, ACCC, ACCG, ACCT, CCCG and CCGG are rare.
The overall SSR density was comparable in all chromosomes. The density of different repeats, however, showed significant variation. Tri- and hexa-nucleotide repeats are more abundant in exons, whereas other repeats are more abundant in non-coding regions.
简单序列重复(SSRs)存在于大多数生物体中,约占人类基因组的3%。尽管越来越清楚这些重复序列在基因组组织和功能中很重要,并且可能与疾病状况相关,但尚未有关于它们的系统分析报道。这是首篇研究整个人类基因组中简单序列重复(1 - 6个碱基对(bp))的分布和密度的报告。
发现人类各染色体上SSRs的密度相对均匀。然而,发现19号染色体上SSRs的总体密度较高。除Y染色体外,与内含子和基因间区域相比,外显子区域中三联体和六联体更为常见。对各种SSRs密度的比较显示,虽然三聚体和五聚体在各染色体上呈现相似模式(500 - 1000 bp/Mb),但二核苷酸、四核苷酸和六核苷酸重复序列呈现更高密度模式(2000 - 3000 bp/Mb)。发现相同核苷酸的重复序列高于其他重复类型。以A、AT、AC、AAT、AAC、AAG、AGC、AAAC、AAAT、AAAG、AAGG、AGAT的重复为主,而C、CG、ACT、ACG、AACC、AACG、AACT、AAGC、AAGT、ACCC、ACCG、ACCT、CCCG和CCGG的重复则很少见。
所有染色体上SSRs的总体密度相当。然而,不同重复序列的密度显示出显著差异。三核苷酸和六核苷酸重复在外显子中更为丰富,而其他重复序列在非编码区域更为丰富。