Evolutionary Genomics Group, Research Programme on Biomedical Informatics - IMIM Hospital del Mar Research Institute, Universitat Pompeu Fabra, Dr. Aiguader 88, Barcelona 08003, Spain.
BMC Evol Biol. 2012 Aug 24;12:155. doi: 10.1186/1471-2148-12-155.
Low-complexity regions (LCRs) in proteins are tracts that are highly enriched in one or a few amino acids. Given their high abundance, and their capacity to expand in relatively short periods of time through replication slippage, they can greatly contribute to increase protein sequence space and generate novel protein functions. However, little is known about the global impact of LCRs on protein evolution.
We have traced back the evolutionary history of 2,802 LCRs from a large set of homologous protein families from H.sapiens, M.musculus, G.gallus, D.rerio and C.intestinalis. Transcriptional factors and other regulatory functions are overrepresented in proteins containing LCRs. We have found that the gain of novel LCRs is frequently associated with repeat expansion whereas the loss of LCRs is more often due to accumulation of amino acid substitutions as opposed to deletions. This dichotomy results in net protein sequence gain over time. We have detected a significant increase in the rate of accumulation of novel LCRs in the ancestral Amniota and mammalian branches, and a reduction in the chicken branch. Alanine and/or glycine-rich LCRs are overrepresented in recently emerged LCR sets from all branches, suggesting that their expansion is better tolerated than for other LCR types. LCRs enriched in positively charged amino acids show the contrary pattern, indicating an important effect of purifying selection in their maintenance.
We have performed the first large-scale study on the evolutionary dynamics of LCRs in protein families. The study has shown that the composition of an LCR is an important determinant of its evolutionary pattern.
蛋白质中的低复杂度区域(LCR)是富含一种或少数几种氨基酸的区域。由于它们的高丰度,以及通过复制滑动在相对短的时间内扩展的能力,它们可以极大地增加蛋白质序列空间并产生新的蛋白质功能。然而,关于 LCR 对蛋白质进化的全局影响知之甚少。
我们从 H.sapiens、M.musculus、G.gallus、D.rerio 和 C.intestinalis 的大量同源蛋白质家族中追溯了 2802 个 LCR 的进化历史。转录因子和其他调节功能在含有 LCR 的蛋白质中过度表达。我们发现,新 LCR 的获得通常与重复扩展有关,而 LCR 的丢失更经常是由于氨基酸取代的积累而不是缺失。这种二分法导致蛋白质序列随时间净增加。我们检测到在祖先 Amniota 和哺乳动物分支中 LCR 积累的新速率显著增加,而在鸡分支中减少。在所有分支中,丙氨酸和/或甘氨酸丰富的 LCR 在最近出现的 LCR 集中过度表达,表明它们的扩展比其他 LCR 类型更能耐受。富含正电荷氨基酸的 LCR 则表现出相反的模式,表明其维持受到纯化选择的重要影响。
我们对蛋白质家族中 LCR 的进化动态进行了首次大规模研究。该研究表明,LCR 的组成是其进化模式的重要决定因素。