Department of Biology, 1001 E 3rd Street, Bloomington, IN 47405, USA.
BMC Microbiol. 2012 Sep 26;12:221. doi: 10.1186/1471-2180-12-221.
Microbial ecologists now routinely utilize next-generation sequencing methods to assess microbial diversity in the environment. One tool heavily utilized by many groups is the Naïve Bayesian Classifier developed by the Ribosomal Database Project (RDP-NBC). However, the consistency and confidence of classifications provided by the RDP-NBC is dependent on the training set utilized.
We explored the stability of classification of honey bee gut microbiota sequences by the RDP-NBC utilizing three publically available ribosomal RNA sequence databases as training sets: ARB-SILVA, Greengenes and RDP. We found that the inclusion of previously published, high-quality, full-length sequences from 16S rRNA clone libraries improved the precision in classification of novel bee-associated sequences. Specifically, by including bee-specific 16S rRNA gene sequences a larger fraction of sequences were classified at a higher confidence by the RDP-NBC (based on bootstrap scores).
Results from the analysis of these bee-associated sequences have ramifications for other environments represented by few sequences in the public databases or few bacterial isolates. We conclude that for the exploration of relatively novel habitats, the inclusion of high-quality, full-length 16S rRNA gene sequences allows for a more confident taxonomic classification.
微生物生态学家现在通常利用下一代测序方法来评估环境中的微生物多样性。许多研究小组都大量使用的一个工具是核糖体数据库项目(RDP-NBC)开发的朴素贝叶斯分类器。然而,RDP-NBC 提供的分类的一致性和置信度取决于所使用的训练集。
我们利用三个公开可用的核糖体 RNA 序列数据库(ARB-SILVA、Greengenes 和 RDP)作为训练集,探索了 RDP-NBC 对蜜蜂肠道微生物群落序列分类的稳定性。我们发现,包含先前发表的高质量全长序列 16S rRNA 克隆文库提高了对新型蜜蜂相关序列的分类精度。具体来说,通过包含蜜蜂特异性 16S rRNA 基因序列,RDP-NBC 可以更高的置信度对更大比例的序列进行分类(基于引导分数)。
这些与蜜蜂相关的序列分析结果对其他在公共数据库中序列较少或细菌分离物较少的环境有影响。我们的结论是,对于相对较新栖息地的探索,包含高质量、全长的 16S rRNA 基因序列可以更有信心地进行分类学分类。