Bohlin Jon, Hardy Simon P, Ussery David W
Norwegian School of Veterinary Science, Oslo, Norway.
BMC Genomics. 2009 Jul 31;10:346. doi: 10.1186/1471-2164-10-346.
The genomic fractions of purine (RR) and alternating pyrimidine/purine (YR) stretches of 10 base pairs or more, have been linked to genomic AT content, the formation of different DNA helices, strand-biased gene distribution, DNA structure, and more. Although some of these factors are a consequence of the chemical properties of purines and pyrimidines, a thorough statistical examination of the distributions of YR/RR stretches in sequenced prokaryotic chromosomes has to the best of our knowledge, not been undertaken. The aim of this study is to expand upon previous research by using regression analysis to investigate how AT content, habitat, growth temperature, pathogenicity, phyla, oxygen requirement and halotolerance correlated with the distribution of RR and YR stretches in prokaryotes.
Our results indicate that RR and YR-stretches are differently distributed in prokaryotic phyla. RR stretches are overrepresented in all phyla except for the Actinobacteria and beta-Proteobacteria. In contrast, YR tracts are underrepresented in all phyla except for the beta-Proteobacterial group. YR-stretches are associated with phylum, pathogenicity and habitat, whilst RR-tracts are associated with phylum, AT content, oxygen requirement, growth temperature and halotolerance. All associations described were statistically significant with p < 0.001.
Analysis of chromosomal distributions of RR/YR sequences in prokaryotes reveals a set of associations with environmental factors not observed with mono- and oligonucleotide frequencies. This implies that important information can be found in the distribution of RR/YR stretches that is more difficult to obtain from genomic mono- and oligonucleotide frequencies. The association between pathogenicity and fractions of YR stretches is assumed to be linked to recombination and horizontal transfer.
10个碱基对及以上的嘌呤(RR)和交替嘧啶/嘌呤(YR)片段的基因组部分,与基因组AT含量、不同DNA螺旋的形成、链偏向性基因分布、DNA结构等相关。尽管其中一些因素是嘌呤和嘧啶化学性质的结果,但据我们所知,尚未对已测序原核染色体中YR/RR片段的分布进行全面的统计分析。本研究的目的是通过回归分析扩展先前的研究,以调查AT含量、栖息地、生长温度、致病性、门、需氧量和耐盐性如何与原核生物中RR和YR片段的分布相关。
我们的结果表明,RR和YR片段在原核生物门中的分布不同。除放线菌门和β-变形菌门外,RR片段在所有门中均占比过高。相比之下,除β-变形菌群外,YR片段在所有门中均占比过低。YR片段与门、致病性和栖息地相关,而RR片段与门、AT含量、需氧量、生长温度和耐盐性相关。所有描述的关联在统计学上均具有显著性,p < 0.001。
对原核生物中RR/YR序列的染色体分布分析揭示了一组与单核苷酸和寡核苷酸频率未观察到的环境因素的关联。这意味着在RR/YR片段分布中可以找到重要信息,而从基因组单核苷酸和寡核苷酸频率中更难获得这些信息。致病性与YR片段比例之间的关联被认为与重组和水平转移有关。