School of Applied Biosciences, Kyungpook National University, Daegu, Republic of Korea.
PLoS One. 2019 Feb 15;14(2):e0212090. doi: 10.1371/journal.pone.0212090. eCollection 2019.
Variable region analysis of 16S rRNA gene sequences is the most common tool in bacterial taxonomic studies. Although used for distinguishing bacterial species, its use remains limited due to the presence of variable copy numbers with sequence variation in the genomes. In this study, 16S rRNA gene sequences, obtained from completely assembled whole genome and Sanger electrophoresis sequencing of cloned PCR products from Serratia fonticola GS2, were compared. Sanger sequencing produced a combination of sequences from multiple copies of 16S rRNA genes. To determine whether the variant copies of 16S rRNA genes affected Sanger sequencing, two ratios (5:5 and 8:2) with different concentrations of cloned 16S rRNA genes were used; it was observed that the greater the number of copies with similar sequences the higher its chance of amplification. Effect of multiple copies for taxonomic classification of 16S rRNA gene sequences was investigated using the strain GS2 as a model. 16S rRNA copies with the maximum variation had 99.42% minimum pairwise similarity and this did not have an effect on species identification. Thus, PCR products from genomes containing variable 16S rRNA gene copies can provide sufficient information for species identification except from species which have high similarity of sequences in their 16S rRNA gene copies like the case of Bacillus thuringiensis and Bacillus cereus. In silico analysis of 1,616 bacterial genomes from long-read sequencing was also done. The average minimum pairwise similarity for each phylum was reported with their average genome size and average "unique copies" of 16S rRNA genes and we found that the phyla Proteobacteria and Firmicutes showed the highest amount of variation in their copies of their 16S rRNA genes. Overall, our results shed light on how the variations in the multiple copies of the 16S rRNA genes of bacteria can aid in appropriate species identification.
16S rRNA 基因序列的可变区分析是细菌分类学研究中最常用的工具。尽管它被用于区分细菌物种,但由于基因组中存在可变拷贝数和序列变异,其应用仍然受到限制。在这项研究中,比较了从完全组装的全基因组和克隆 PCR 产物的 Sanger 电泳测序中获得的 Serratia fonticola GS2 的 16S rRNA 基因序列。Sanger 测序产生了多个 16S rRNA 基因拷贝的组合序列。为了确定 16S rRNA 基因的变异拷贝是否影响 Sanger 测序,使用了两种不同浓度克隆 16S rRNA 基因的比例(5:5 和 8:2);观察到具有相似序列的拷贝数越多,其扩增的机会就越大。使用菌株 GS2 作为模型,研究了 16S rRNA 基因序列的多个拷贝对分类的影响。具有最大变异的 16S rRNA 拷贝具有 99.42%的最小成对相似性,这不会影响物种鉴定。因此,包含可变 16S rRNA 基因拷贝的基因组的 PCR 产物可以提供足够的信息用于物种鉴定,除非在 16S rRNA 基因拷贝中具有高相似性的序列的物种,例如 Bacillus thuringiensis 和 Bacillus cereus 的情况。还对来自长读测序的 1,616 个细菌基因组进行了计算机分析。报告了每个门的平均最小成对相似性及其平均基因组大小和平均“独特拷贝”的 16S rRNA 基因,我们发现,门 Proteobacteria 和 Firmicutes 在其 16S rRNA 基因拷贝的变化量最大。总的来说,我们的结果揭示了细菌 16S rRNA 基因多个拷贝的变化如何有助于适当的物种鉴定。