Montanucci Ludovica, Martelli Pier Luigi, Fariselli Piero, Casadio Rita
Biocomputing Group, CIRB/Dept of Biology, University of Bologna, Bologna, Italy.
J Proteome Res. 2007 Jul;6(7):2502-8. doi: 10.1021/pr060670p. Epub 2007 May 26.
Can genome analysis tell us about the lifestyle of an organism? We ask this question considering a thorough cross comparison of thermophilic and mesophilic genomes, since presently the number of available genomes is enough to ensure statistical significance of the results. We analyze, by means of principal component analysis (PCA), the codon composition of a database comprising 116 genomes, selected so as to include one species for each genus and show that a cross genomic approach can allow the extraction of common determinants of thermostability at the genome level. The results of our analysis indicate that all the known features of thermostability can be found in the 64 component loadings of the second principal axis of PCA. By this, we develop an index of thermostability whose discriminative power between mesophiles and thermophiles scores with 98% accuracy at the genome level and with 95% accuracy at the protein sequence level. We also prove that these results are not due to phylogenetic differences between archaea and bacteria.
基因组分析能告诉我们一个生物体的生活方式吗?考虑到对嗜热菌和嗜温菌基因组进行全面的交叉比较,我们提出了这个问题,因为目前可用基因组的数量足以确保结果具有统计学意义。我们通过主成分分析(PCA),分析了一个包含116个基因组的数据库的密码子组成,这些基因组的选择方式是每个属包含一个物种,并表明交叉基因组方法可以在基因组水平上提取热稳定性的共同决定因素。我们的分析结果表明,热稳定性的所有已知特征都可以在PCA第二主成分轴的64个成分载荷中找到。据此,我们开发了一个热稳定性指数,其在嗜温菌和嗜热菌之间的判别能力在基因组水平上的准确率为98%,在蛋白质序列水平上的准确率为95%。我们还证明这些结果并非古细菌和细菌之间的系统发育差异所致。