Sandberg Rickard, Bränden Carl-Ivar, Ernberg Ingemar, Cöster Joakim
Microbiology and Tumor Biology Center, Karolinska Institute, S-171 77 Stockholm, Sweden.
Gene. 2003 Jun 5;311:35-42. doi: 10.1016/s0378-1119(03)00581-x.
Each prokaryote has a unique genomic signature as evidenced by a set of species-specific frequencies of short oligonucleotides. With respect to genomic signatures a bacterial genome is homogenous and the variation within a genome is smaller than the variations between genomes of different species. This study quantifies the species-specificity of genomic signatures in the complete genomes of 57 prokaryotes. The species-specificity in the genomic signature was related to the quantification of other sequence biases, such as G+C content, synonymous codon choice and amino acid usage. The results confirm that the genomic signature is genome-wide with high species-specificity in both coding and non-coding regions. In coding regions the species-specific bias in synonymous codon choice was comparable to the genomic signature, while the bias in amino acid usage only captured about 50% of the species-specific bias in the genomic signature. A correlation between the species-specificity in synonymous codon choice and amino acid usage was identified, in which proteins with species-specific amino acid usage were also coded with species-specific synonymous codon choice. However, we demonstrated that the G+C content captures only approximately 40% of the species-specificity in the genomic signature, and is insufficient to explain the species specificity in the non-coding regions. Thus, the species-specific bias in non-coding regions remains largely unknown. Further, we compared the genomic signature in relation to phylogenetic distance. This was performed in order to illustrate the feasibility of a hierarchical classification scheme in future applications of the described classification methodology in screening for horizontal gene transfer and biodiversity studies.
每个原核生物都有独特的基因组特征,这一点可由一组短寡核苷酸的物种特异性频率来证明。就基因组特征而言,细菌基因组是同质的,基因组内的变异小于不同物种基因组之间的变异。本研究对57种原核生物完整基因组中基因组特征的物种特异性进行了量化。基因组特征中的物种特异性与其他序列偏差的量化有关,如G+C含量、同义密码子选择和氨基酸使用情况。结果证实,基因组特征在全基因组范围内,在编码区和非编码区都具有高度的物种特异性。在编码区,同义密码子选择中的物种特异性偏差与基因组特征相当,而氨基酸使用情况中的偏差仅捕获了基因组特征中约50%的物种特异性偏差。确定了同义密码子选择中的物种特异性与氨基酸使用情况之间的相关性,其中具有物种特异性氨基酸使用情况的蛋白质也由物种特异性同义密码子选择编码。然而,我们证明G+C含量仅捕获了基因组特征中约40%的物种特异性,不足以解释非编码区的物种特异性。因此,非编码区的物种特异性偏差在很大程度上仍然未知。此外,我们比较了与系统发育距离相关的基因组特征。这样做是为了说明在所述分类方法未来用于筛选水平基因转移和生物多样性研究时,分层分类方案的可行性。