Kreil D P, Ouzounis C A
University of Cambridge and European Bioinformatics Institute, Computational Genomics Group, Research Programme, The European Bioinformatics Institute, EMBL Outstation, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.
Nucleic Acids Res. 2001 Apr 1;29(7):1608-15. doi: 10.1093/nar/29.7.1608.
The global amino acid compositions as deduced from the complete genomic sequences of six thermophilic archaea, two thermophilic bacteria, 17 mesophilic bacteria and two eukaryotic species were analysed by hierarchical clustering and principal components analysis. Both methods showed an influence of several factors on amino acid composition. Although GC content has a dominant effect, thermophilic species can be identified by their global amino acid compositions alone. This study presents a careful statistical analysis of factors that affect amino acid composition and also yielded specific features of the average amino acid composition of thermophilic species. Moreover, we introduce the first example of a 'compositional tree' of species that takes into account not only homologous proteins, but also proteins unique to particular species. We expect this simple yet novel approach to be a useful additional tool for the study of phylogeny at the genome level.
通过层次聚类和主成分分析,对从6种嗜热古菌、2种嗜热细菌、17种嗜温细菌和2种真核生物的完整基因组序列推导得出的全球氨基酸组成进行了分析。两种方法均显示了若干因素对氨基酸组成的影响。尽管GC含量具有主导作用,但仅通过其全球氨基酸组成就能识别嗜热物种。本研究对影响氨基酸组成的因素进行了细致的统计分析,并得出了嗜热物种平均氨基酸组成的特定特征。此外,我们引入了首个物种“组成树”的实例,该实例不仅考虑了同源蛋白,还考虑了特定物种特有的蛋白。我们期望这种简单而新颖的方法能成为基因组水平系统发育研究的有用补充工具。