元基因组分类器的综合基准测试和集成方法。

BACKGROUND: One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. RESULTS: In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. CONCLUSIONS: This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

背景：宏基因组学面临的主要挑战之一是鉴定临床和环境样本中的微生物。虽然有大量异构的计算工具可用于使用全基因组鸟枪法测序数据对微生物进行分类，但这些方法的综合比较有限。

结果：在这项研究中，我们使用了最大的实验室生成和模拟对照数据集，涵盖了 846 个物种，以评估 11 种宏基因组分类器的性能。这些工具的特点是基于它们在属、种和菌株水平上识别分类群的能力、量化分类群的相对丰度以及将单个读取分类到物种水平的能力。引人注目的是，在相同的数据集上，11 种工具识别的物种数量可以相差三个数量级以上。各种策略可以改善分类错误，包括丰度过滤、集成方法和工具交叉。然而，这些策略往往不足以完全消除环境样本中的假阳性，这在涉及医学相关物种时尤为重要。总体而言，将具有不同分类策略（k-mer、比对、标记）的工具配对可以结合它们各自的优势。

结论：本研究通过比较精度、准确性和召回率的范围，为宏基因组分析提供了阳性和阴性对照、滴定标准以及选择工具的指南。我们表明，适当的实验设计和分析参数可以减少假阳性，提高复杂宏基因组样本中物种的分辨率，并改善结果的解释。