Das Roy Rishi, Bhardwaj Manju, Bhatnagar Vasudha, Chakraborty Kausik, Dash Debasis
GNR Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Council of Scientific and Industrial Research, Delhi, 110007, India ; Department of Biotechnology, University of Pune, Pune, 411007, India.
Department of Computer Science, Maitreyi College, Chanakyapuri, Delhi, 110021, India.
F1000Res. 2014 Jun 27;3:137. doi: 10.12688/f1000research.4307.1. eCollection 2014.
Eubacterial genomes vary considerably in their nucleotide composition. The percentage of genetic material constituted by guanosine and cytosine (GC) nucleotides ranges from 20% to 70%. It has been posited that GC-poor organisms are more dependent on protein folding machinery. Previous studies have ascribed this to the accumulation of mildly deleterious mutations in these organisms due to population bottlenecks. This phenomenon has been supported by protein folding simulations, which showed that proteins encoded by GC-poor organisms are more prone to aggregation than proteins encoded by GC-rich organisms. To test this proposition using a genome-wide approach, we classified different eubacterial proteomes in terms of their aggregation propensity and chaperone-dependence using multiple machine learning models. In contrast to the expected decrease in protein aggregation with an increase in GC richness, we found that the aggregation propensity of proteomes increases with GC content. A similar and even more significant correlation was obtained with the GroEL-dependence of proteomes: GC-poor proteomes have evolved to be less dependent on GroEL than GC-rich proteomes. We thus propose that a decrease in eubacterial GC content may have been selected in organisms facing proteostasis problems.
真细菌基因组的核苷酸组成差异很大。鸟嘌呤和胞嘧啶(GC)核苷酸构成的遗传物质百分比在20%至70%之间。有人认为,GC含量低的生物体对蛋白质折叠机制的依赖性更强。先前的研究将此归因于这些生物体由于种群瓶颈而积累了轻度有害突变。蛋白质折叠模拟支持了这一现象,模拟结果表明,GC含量低的生物体编码的蛋白质比GC含量高的生物体编码的蛋白质更容易聚集。为了使用全基因组方法验证这一观点,我们使用多个机器学习模型,根据不同真细菌蛋白质组的聚集倾向和伴侣蛋白依赖性对其进行了分类。与预期的随着GC含量增加蛋白质聚集减少相反,我们发现蛋白质组的聚集倾向随着GC含量的增加而增加。在蛋白质组对GroEL的依赖性方面也获得了类似甚至更显著的相关性:GC含量低的蛋白质组进化为比GC含量高的蛋白质组对GroEL的依赖性更小。因此,我们提出,在面临蛋白质稳态问题的生物体中,真细菌GC含量的降低可能是被选择的结果。