Suppr超能文献

基因组的氨基酸组成、生物的生活方式及进化趋势:基于对应分析的全局图景

Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis.

作者信息

Tekaia Fredj, Yeramian Edouard, Dujon Bernard

机构信息

Unité de Génétique Moléculaire des Levures (URA 2171 CNRS and UFR927, Univ. P. M. Curie, Paris), Institut Pasteur, 25, Rue du Dr Roux, F-75724, Paris Cedex 15, France.

出版信息

Gene. 2002 Sep 4;297(1-2):51-60. doi: 10.1016/s0378-1119(02)00871-5.

Abstract

Can we infer the lifestyle of an organism from the characteristic properties of its genome? More precisely, what are the relations between easily quantifiable properties from genomic sequences, such as amino-acid compositions, and more subtle characteristics concerning for example lifestyles or evolutionary trends? Here, we seek a global picture for such properties, based on a large number (56) of complete genomes, including significant numbers of representatives from the three domains of life. We consider the amino acid compositions of the predicted proteomes, and we use correspondence analysis, as a multivariate method to extract the relevant information from the large-scale data. From these analyses we derive a series of conclusions, concerning lifestyles, as well as physico-chemical and evolutionary trends: (1) correspondence analysis of the amino acid compositions permits discrimination between the three known lifestyles (mesophily/thermophily/hyperthermophily). (2) For various organisms, amino-acid composition properties are essentially driven by GC content, and to a significantly lesser extent by growth temperatures associated with lifestyles. Roughly speaking, the respective contributions of these two components are 57 and 20%. It is notable that these proportions are essentially unchanged with respect to a previous analysis (Nature 393 (1998) 537), which involved only 15 genomes, available at the time. (3) In terms of amino acid compositional biases, two specific 'signatures' for thermophily (in a broad sense, including hyperthermophily) can be detected. First, thermophilic species display a relative abundance in glutamic acid (Glu), concomitantly with the depletion in glutamine. Second, in thermophilic species, the relative abundance in Glu (negative charge) is significantly correlated (Pearson correlation coefficient r=0.83 with P<0.0001), with the increase in the lumped 'pool' lysine+arginine (positive charges). This correlation (absent in mesophiles) could be interpreted on a physico-chemical basis, relevant to the thermostability of proteins. (4) Statistically significant differences are observed between the average lengths of the genes in the surveyed species, which follow their distribution between the three domains of life. Also a significant difference is observed between the average lengths of thermophilic (283.0+/-5.8) versus mesophilic (340+/-9.4) genes. It is thus possible that the 'general' shortening of the primary sequences in thermophilic proteins plays a role in thermostability. (5) Considering various combinations of conservation properties (genes conserved exclusively in eukaryotes, in archaea, in bacteria, in combinations of two domains, etc.) correspondence analysis reveals a trend towards thermophilic-hyperthermophilic profiles for the most conserved subset of genes (ancient genes). (6) When limited to the subset of species-specific genes, correspondence analysis leads to a different picture for the clustering of genomes following amino-acid compositions: for example, the 'core' specific part of a genome can bear lifestyle signatures different from those of the complete genome.Various results are discussed both on methodological and biological grounds. The evolutionary perspectives opened by our analyses are noted.

摘要

我们能否从生物体基因组的特征属性推断其生活方式?更确切地说,基因组序列中易于量化的属性(如氨基酸组成)与例如生活方式或进化趋势等更细微的特征之间有什么关系?在此,我们基于大量(56个)完整基因组,包括来自生命三个域的大量代表性基因组,来探寻此类属性的全局图景。我们考虑预测蛋白质组的氨基酸组成,并使用对应分析作为一种多元方法从大规模数据中提取相关信息。通过这些分析,我们得出了一系列关于生活方式以及物理化学和进化趋势的结论:(1)氨基酸组成的对应分析能够区分三种已知的生活方式(嗜温性/嗜热性/超嗜热性)。(2)对于各种生物体,氨基酸组成属性主要由GC含量驱动,而与生活方式相关的生长温度的驱动程度则显著较小。粗略地说,这两个组成部分的各自贡献分别为57%和20%。值得注意的是,相对于之前仅涉及当时可用的15个基因组的分析(《自然》393(1998)537),这些比例基本没有变化。(3)就氨基酸组成偏差而言,可以检测到嗜热(广义上包括超嗜热)的两个特定“特征”。首先,嗜热物种中谷氨酸(Glu)相对丰富,同时谷氨酰胺减少。其次,在嗜热物种中,Glu(负电荷)的相对丰富与赖氨酸+精氨酸(正电荷)的集中“库”的增加显著相关(皮尔逊相关系数r = 0.83,P < 0.0001)。这种相关性(嗜温菌中不存在)可以从与蛋白质热稳定性相关的物理化学基础上进行解释。(4)在所调查物种的基因平均长度之间观察到统计学上的显著差异,这些差异遵循它们在生命三个域之间的分布。嗜热基因(283.0±5.8)与嗜温基因(340±9.4)的平均长度之间也观察到显著差异。因此,嗜热蛋白质一级序列的“普遍”缩短可能在热稳定性中起作用。(5)考虑保守属性的各种组合(仅在真核生物、古细菌、细菌中保守的基因,在两个域的组合中等),对应分析揭示了最保守基因子集(古老基因)呈现嗜热 - 超嗜热特征的趋势。(6)当仅限于物种特异性基因子集时,对应分析对于根据氨基酸组成进行的基因组聚类会得出不同的图景:例如,基因组的“核心”特定部分可能具有与完整基因组不同的生活方式特征。我们从方法学和生物学角度讨论了各种结果。注意到了我们的分析所开启的进化视角。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验