Shah Nidhi, Molloy Erin K, Pop Mihai, Warnow Tandy
Department of Computer Science, University of Maryland College Park, College Park, MD 02740, USA.
Department of Computer Science, University of California Los Angeles, CA 90095, USA.
Bioinformatics. 2021 Jul 27;37(13):1839-1845. doi: 10.1093/bioinformatics/btab023.
Metagenomics has revolutionized microbiome research by enabling researchers to characterize the composition of complex microbial communities. Taxonomic profiling is one of the critical steps in metagenomic analyses. Marker genes, which are single-copy and universally found across Bacteria and Archaea, can provide accurate estimates of taxon abundances in the sample.
We present TIPP2, a marker gene-based abundance profiling method, which combines phylogenetic placement with statistical techniques to control classification precision and recall. TIPP2 includes an updated set of reference packages and several algorithmic improvements over the original TIPP method. We find that TIPP2 provides comparable or better estimates of abundance than other profiling methods (including Bracken, mOTUsv2 and MetaPhlAn2), and strictly dominates other methods when there are under-represented (novel) genomes present in the dataset.
The code for our method is freely available in open-source form at https://github.com/smirarab/sepp/blob/tipp2/README.TIPP.md. The code and procedure to create new reference packages for TIPP2 are available at https://github.com/shahnidhi/TIPP_reference_package.
Supplementary data are available at Bioinformatics online.
宏基因组学通过使研究人员能够表征复杂微生物群落的组成,彻底改变了微生物组研究。分类学分析是宏基因组分析的关键步骤之一。标记基因是在细菌和古菌中普遍存在的单拷贝基因,可提供样本中分类单元丰度的准确估计。
我们提出了TIPP2,这是一种基于标记基因的丰度分析方法,它将系统发育定位与统计技术相结合,以控制分类精度和召回率。TIPP2包括一组更新的参考包,以及对原始TIPP方法的若干算法改进。我们发现,TIPP2提供的丰度估计与其他分析方法(包括Bracken、mOTUsv2和MetaPhlAn2)相当或更好,并且当数据集中存在代表性不足(新的)基因组时,TIPP2严格优于其他方法。
补充数据可在《生物信息学》在线获取。