The Pathogen and Microbiome Institute, Northern Arizona University, PO Box 4073, Flagstaff, AZ, 86011-4073, USA.
Research School of Biology, Australian National University, 46 Sullivans Creek Road, Acton ACT, 2601, Australia.
Microbiome. 2018 May 17;6(1):90. doi: 10.1186/s40168-018-0470-z.
Taxonomic classification of marker-gene sequences is an important step in microbiome analysis.
We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ).
Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.
标记基因序列的分类学分类是微生物组分析的重要步骤。
我们提出了 q2-feature-classifier(https://github.com/qiime2/q2-feature-classifier),这是一个 QIIME 2 插件,包含几种新的机器学习和基于比对的方法,用于分类学分类。我们评估和优化了几种在 QIIME 1 中常用的分类方法(RDP、BLAST、UCLUST 和 SortMeRNA),以及几种在 QIIME 2 中实现的新方法(基于 scikit-learn 的朴素贝叶斯机器学习分类器,以及基于 VSEARCH 和 BLAST+的基于比对的分类共识方法),用于细菌 16S rRNA 和真菌 ITS 标记基因扩增子序列数据的分类。在 QIIME 2 中实现的朴素贝叶斯、BLAST+-和 VSEARCH-基于的分类器在本工作中评估的用于分类标记基因序列的其他常用方法的种水平准确性方面达到或超过了其他常用方法。这些评估基于 19 个模拟群落和无错误序列模拟,包括对模拟“新”标记基因序列的分类,可在我们的可扩展基准测试框架 tax-credit(https://github.com/caporaso-lab/tax-credit-data)中获得。
我们的结果说明了优化分类器性能时参数调整的重要性,并且我们针对这些分类器在一系列标准操作条件下的参数选择提出了建议。q2-feature-classifier 和 tax-credit 都是免费的、开源的、BSD 许可的软件包,可在 GitHub 上获得。