Qmatey：一种自动化的基于快速精确匹配的宏基因组比对、菌株分类和分析的流水线。

Qmatey: an automated pipeline for fast exact matching-based alignment and strain-level taxonomic binning and profiling of metagenomes.

机构信息

Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA.

UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37996, USA.

出版信息

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad351.

DOI:10.1093/bib/bbad351

PMID:37824740

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10569747/

Abstract

Metagenomics is a powerful tool for understanding organismal interactions; however, classification, profiling and detection of interactions at the strain level remain challenging. We present an automated pipeline, quantitative metagenomic alignment and taxonomic exact matching (Qmatey), that performs a fast exact matching-based alignment and integration of taxonomic binning and profiling. It interrogates large databases without using metagenome-assembled genomes, curated pan-genes or k-mer spectra that limit resolution. Qmatey minimizes misclassification and maintains strain level resolution by using only diagnostic reads as shown in the analysis of amplicon, quantitative reduced representation and shotgun sequencing datasets. Using Qmatey to analyze shotgun data from a synthetic community with 35% of the 26 strains at low abundance (0.01-0.06%), we revealed a remarkable 85-96% strain recall and 92-100% species recall while maintaining 100% precision. Benchmarking revealed that the highly ranked Kraken2 and KrakenUniq tools identified 2-4 more taxa (92-100% recall) than Qmatey but produced 315-1752 false positive taxa and high penalty on precision (1-8%). The speed, accuracy and precision of the Qmatey pipeline positions it as a valuable tool for broad-spectrum profiling and for uncovering biologically relevant interactions.

摘要

宏基因组学是理解生物相互作用的有力工具；然而，在菌株水平上进行分类、分析和检测相互作用仍然具有挑战性。我们提出了一种自动化的流程，即定量宏基因组比对和分类精确匹配（Qmatey），它执行快速的基于精确匹配的比对和分类-bin 分析以及剖析。它无需使用宏基因组组装基因组、经过编目的泛基因或限制分辨率的 k-mer 谱，即可对大型数据库进行查询。Qmatey 通过仅使用诊断性读取最小化错误分类并保持菌株水平的分辨率，如在对扩增子、定量简化代表性和鸟枪法测序数据集的分析中所示。使用 Qmatey 分析来自一个合成群落的鸟枪法数据，其中 26 个菌株中有 35%的菌株丰度较低（0.01-0.06%），我们发现了惊人的 85-96%的菌株召回率和 92-100%的物种召回率，同时保持了 100%的精度。基准测试表明，排名较高的 Kraken2 和 KrakenUniq 工具比 Qmatey 多识别了 2-4 个分类群（92-100%的召回率），但产生了 315-1752 个假阳性分类群，并且精度惩罚很高（1-8%）。Qmatey 流程的速度、准确性和精度使其成为广谱剖析和揭示生物学相关相互作用的有价值的工具。