Department of Computer Science, University of California, Los Angeles, CA, 90095, USA.
Department of Computer Science, ETH Zurich, Rämistrasse 101, CH-8092, Zurich, Switzerland.
Genome Biol. 2020 Sep 10;21(1):242. doi: 10.1186/s13059-020-02159-0.
Metagenomic profiling, predicting the presence and relative abundances of microbes in a sample, is a critical first step in microbiome analysis. Alignment-based approaches are often considered accurate yet computationally infeasible. Here, we present a novel method, Metalign, that performs efficient and accurate alignment-based metagenomic profiling. We use a novel containment min hash approach to pre-filter the reference database prior to alignment and then process both uniquely aligned and multi-aligned reads to produce accurate abundance estimates. In performance evaluations on both real and simulated datasets, Metalign is the only method evaluated that maintained high performance and competitive running time across all datasets.
宏基因组分析是微生物组分析的关键第一步,可预测样本中微生物的存在和相对丰度。基于比对的方法通常被认为是准确的,但在计算上不可行。在这里,我们提出了一种新的方法 Metalign,它可以进行高效准确的基于比对的宏基因组分析。我们使用一种新的包含最小哈希方法在比对之前对参考数据库进行预过滤,然后处理唯一比对和多比对的reads 以产生准确的丰度估计。在真实和模拟数据集上的性能评估中,Metalign 是唯一一种在所有数据集上都保持高性能和竞争运行时间的方法。