Rozanska Matylda, Sobczak Krzysztof, Jasinska Anna, Napierala Marek, Kaczynska Danuta, Czerny Anna, Koziel Magdalena, Kozlowski Piotr, Olejniczak Marta, Krzyzosiak Wlodzimierz J
Laboratory of Cancer Genetics, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.
Hum Mutat. 2007 May;28(5):451-8. doi: 10.1002/humu.20466.
Although the trinucleotide repeats are present in the exons of numerous human genes, the allele distribution is not well known, and the factors responsible for their intergenic and intragenic variability are not well understood. We have analyzed the length and sequence variation within the most commonly occurring CAG and CTG repeats in a large number of human genes selected to contain the longest reported repeat tracts. Our study revealed that in genes other than those implicated in the Triplet Repeat Expansion Diseases (TREDs), the very long and highly polymorphic repeats are rather infrequent. The length of pure repeat tract in the most frequent allele was found to correlate well with the rate of the repeat length polymorphism, and CAA triplets were shown to be the most frequent CAG repeat interruptions. As both the CAG and CAA triplets code for glutamine, our results may suggest that the selective pressure disfavors the long uninterrupted CAG repeats in genes and transcripts but not the long normal polyglutamine tracts in proteins. This may indicate that hairpin structures formed in ssDNA and RNA by long pure CAG repeats would be selected against as they may impede normal cellular processes.
尽管三核苷酸重复序列存在于众多人类基因的外显子中,但其等位基因分布尚不明确,导致其基因间和基因内变异的因素也未得到充分理解。我们分析了大量选定的含有已报道最长重复序列片段的人类基因中最常见的CAG和CTG重复序列的长度和序列变异。我们的研究表明,在除与三核苷酸重复序列扩增疾病(TREDs)相关的基因之外的其他基因中,非常长且高度多态的重复序列相当罕见。发现最常见等位基因中纯重复序列片段的长度与重复序列长度多态性的速率密切相关,并且CAA三联体被证明是最常见的CAG重复序列中断。由于CAG和CAA三联体均编码谷氨酰胺,我们的结果可能表明,选择压力不利于基因和转录本中长的不间断CAG重复序列,但对蛋白质中长的正常多聚谷氨酰胺序列片段并无不利影响。这可能表明,长的纯CAG重复序列在单链DNA和RNA中形成的发夹结构会被选择淘汰,因为它们可能会阻碍正常的细胞过程。