Freitas Tracey Allen K, Li Po-E, Scholz Matthew B, Chain Patrick S G
Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
Nucleic Acids Res. 2015 May 26;43(10):e69. doi: 10.1093/nar/gkv180. Epub 2015 Mar 12.
A major challenge in the field of shotgun metagenomics is the accurate identification of organisms present within a microbial community, based on classification of short sequence reads. Though existing microbial community profiling methods have attempted to rapidly classify the millions of reads output from modern sequencers, the combination of incomplete databases, similarity among otherwise divergent genomes, errors and biases in sequencing technologies, and the large volumes of sequencing data required for metagenome sequencing has led to unacceptably high false discovery rates (FDR). Here, we present the application of a novel, gene-independent and signature-based metagenomic taxonomic profiling method with significantly and consistently smaller FDR than any other available method. Our algorithm circumvents false positives using a series of non-redundant signature databases and examines Genomic Origins Through Taxonomic CHAllenge (GOTTCHA). GOTTCHA was tested and validated on 20 synthetic and mock datasets ranging in community composition and complexity, was applied successfully to data generated from spiked environmental and clinical samples, and robustly demonstrates superior performance compared with other available tools.
鸟枪法宏基因组学领域的一个主要挑战是,基于短序列 reads 的分类,准确识别微生物群落中存在的生物体。尽管现有的微生物群落分析方法试图快速对现代测序仪输出的数百万条 reads 进行分类,但不完整的数据库、不同基因组之间的相似性、测序技术中的错误和偏差,以及宏基因组测序所需的大量测序数据,导致了高得令人无法接受的错误发现率(FDR)。在这里,我们展示了一种新颖的、基于基因独立和特征的宏基因组分类分析方法的应用,该方法的 FDR 显著且始终低于任何其他现有方法。我们的算法使用一系列非冗余特征数据库规避假阳性,并通过分类挑战检验基因组起源(GOTTCHA)。GOTTCHA 在 20 个合成和模拟数据集上进行了测试和验证,这些数据集的群落组成和复杂性各不相同,并成功应用于加标环境和临床样本生成的数据,与其他现有工具相比,有力地证明了其卓越的性能。