Simon Michelle, Hancock John M
Bioinformatics Group, MRC Harwell, Mammalian Genetics Unit, Harwell Science and Innovation Campus, Harwell, Oxfordshire OX110RD, UK.
Genome Biol. 2009;10(6):R59. doi: 10.1186/gb-2009-10-6-r59. Epub 2009 Jun 1.
Amino acid repeats (AARs) are common features of protein sequences. They often evolve rapidly and are involved in a number of human diseases. They also show significant associations with particular Gene Ontology (GO) functional categories, particularly transcription, suggesting they play some role in protein function. It has been suggested recently that AARs play a significant role in the evolution of intrinsically unstructured regions (IURs) of proteins. We investigate the relationship between AAR frequency and evolution and their localization within proteins based on a set of 5,815 orthologous proteins from four mammalian (human, chimpanzee, mouse and rat) and a bird (chicken) genome. We consider two classes of AAR (tandem repeats and cryptic repeats: regions of proteins containing overrepresentations of short amino acid repeats).
Mammals show very similar repeat frequencies but chicken shows lower frequencies of many of the cryptic repeats common in mammals. Regions flanking tandem AARs evolve more rapidly than the rest of the protein containing the repeat and this phenomenon is more pronounced for non-conserved repeats than for conserved ones. GO associations are similar to those previously described for the mammals, but chicken cryptic repeats show fewer significant associations. Comparing the overlaps of AARs with IURs and protein domains showed that up to 96% of some AAR types are associated preferentially with IURs. However, no more than 15% of IURs contained an AAR.
Their location within IURs explains many of the evolutionary properties of AARs. Further study is needed on the types of IURs containing AARs.
氨基酸重复序列(AARs)是蛋白质序列的常见特征。它们通常进化迅速,与多种人类疾病相关。它们还与特定的基因本体(GO)功能类别,特别是转录,显示出显著关联,表明它们在蛋白质功能中发挥一定作用。最近有人提出,AARs在蛋白质内在无序区域(IURs)的进化中起重要作用。我们基于来自四种哺乳动物(人类、黑猩猩、小鼠和大鼠)和一种鸟类(鸡)基因组的5815个直系同源蛋白质,研究了AAR频率与进化之间的关系及其在蛋白质中的定位。我们考虑了两类AAR(串联重复和隐蔽重复:蛋白质中含有短氨基酸重复过度代表的区域)。
哺乳动物显示出非常相似的重复频率,但鸡显示出许多在哺乳动物中常见的隐蔽重复频率较低。串联AAR两侧的区域比包含该重复的蛋白质其余部分进化得更快,这种现象在非保守重复中比在保守重复中更明显。GO关联与先前描述的哺乳动物的关联相似,但鸡的隐蔽重复显示出较少的显著关联。比较AARs与IURs和蛋白质结构域的重叠情况表明,某些AAR类型中高达96%优先与IURs相关联。然而,不超过15%的IURs包含一个AAR。
它们在IURs中的位置解释了AARs的许多进化特性。需要对含有AARs的IURs类型进行进一步研究。