Czech Lucas, Stamatakis Alexandros, Dunthorn Micah, Barbera Pierre
Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, United States.
Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
Front Bioinform. 2022 May 26;2:871393. doi: 10.3389/fbinf.2022.871393. eCollection 2022.
Phylogenetic placement refers to a family of tools and methods to analyze, visualize, and interpret the tsunami of metagenomic sequencing data generated by high-throughput sequencing. Compared to alternative (e. g., similarity-based) methods, it puts metabarcoding sequences into a phylogenetic context using a set of known reference sequences and taking evolutionary history into account. Thereby, one can increase the accuracy of metagenomic surveys and eliminate the requirement for having exact or close matches with existing sequence databases. Phylogenetic placement constitutes a valuable analysis tool , but also entails a plethora of downstream tools to interpret its results. A common use case is to analyze species communities obtained from metagenomic sequencing, for example via taxonomic assignment, diversity quantification, sample comparison, and identification of correlations with environmental variables. In this review, we provide an overview over the methods developed during the first 10 years. In particular, the goals of this review are 1) to motivate the usage of phylogenetic placement and illustrate some of its use cases, 2) to outline the full workflow, from raw sequences to publishable figures, including best practices, 3) to introduce the most common tools and methods and their capabilities, 4) to point out common placement pitfalls and misconceptions, 5) to showcase typical placement-based analyses, and how they can help to analyze, visualize, and interpret phylogenetic placement data.
系统发育定位是指用于分析、可视化和解释高通量测序产生的海量宏基因组测序数据的一系列工具和方法。与其他(例如基于相似性的)方法相比,它利用一组已知的参考序列并考虑进化历史,将元条形码序列置于系统发育背景中。由此,可以提高宏基因组调查的准确性,并消除与现有序列数据库进行精确或近似匹配的要求。系统发育定位是一种有价值的分析工具,但也需要大量下游工具来解释其结果。一个常见的用例是分析从宏基因组测序中获得的物种群落,例如通过分类归属、多样性量化、样本比较以及识别与环境变量的相关性。在本综述中,我们概述了前10年开发的方法。特别是,本综述的目标是:1)激发系统发育定位的使用并说明其一些用例;2)概述从原始序列到可发表图表的完整工作流程,包括最佳实践;3)介绍最常用的工具和方法及其功能;4)指出常见的定位陷阱和误解;5)展示基于定位的典型分析,以及它们如何有助于分析、可视化和解释系统发育定位数据。