Faux Noel G, Huttley Gavin A, Mahmood Khalid, Webb Geoffrey I, de la Banda Maria Garcia, Whisstock James C
Protein Crystallography Unit, Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, Australia.
Genome Res. 2007 Jul;17(7):1118-27. doi: 10.1101/gr.6255407. Epub 2007 Jun 13.
Over 3% of human proteins contain single amino acid repeats (repeat-containing proteins, RCPs). Many repeats (homopeptides) localize to important proteins involved in transcription, and the expansion of certain repeats, in particular poly-Q and poly-A tracts, can also lead to the development of neurological diseases. Previous studies have suggested that the homopeptide makeup is a result of the presence of G+C-rich tracts in the encoding genes and that expansion occurs via replication slippage. Here, we have performed a large-scale genomic analysis of the variation of the genes encoding RCPs in 13 species and present these data in an online database (http://repeats.med.monash.edu.au/genetic_analysis/). This resource allows rapid comparison and analysis of RCPs, homopeptides, and their underlying genetic tracts across the eukaryotic species considered. We report three major findings. First, there is a bias for a small subset of codons being reiterated within homopeptides, and there is no G+C or A+T bias relative to the organism's transcriptome. Second, single base pair transversions from the homocodon are unusually common and may represent a mechanism of reducing the rate of homopeptide mutations. Third, homopeptides that are conserved across different species lie within regions that are under stronger purifying selection in contrast to nonconserved homopeptides.
超过3%的人类蛋白质含有单氨基酸重复序列(含重复序列的蛋白质,RCPs)。许多重复序列(同肽)定位于参与转录的重要蛋白质中,某些重复序列的扩增,特别是多聚谷氨酰胺和多聚腺嘌呤序列,也会导致神经疾病的发生。先前的研究表明,同肽组成是编码基因中富含G+C序列存在的结果,并且扩增是通过复制滑动发生的。在这里,我们对13个物种中编码RCPs的基因变异进行了大规模基因组分析,并将这些数据呈现在一个在线数据库中(http://repeats.med.monash.edu.au/genetic_analysis/)。该资源允许对所考虑的真核生物物种中的RCPs、同肽及其潜在的遗传序列进行快速比较和分析。我们报告了三个主要发现。第一,同肽内存在一小部分密码子被重复的偏向性,并且相对于生物体的转录组不存在G+C或A+T偏向性。第二,同密码子的单碱基对颠换异常常见,可能代表了一种降低同肽突变率的机制。第三,与非保守同肽相比,在不同物种间保守的同肽位于受到更强纯化选择的区域内。