Graduate Program in Bioinformatics, University of British Columbia, Genome Sciences Centre, Vancouver, British Columbia, Canada.
Department of Microbiology and Immunology, University of British Columbia, Vancouver, British Columbia, Canada.
Curr Protoc. 2023 Feb;3(2):e671. doi: 10.1002/cpz1.671.
Gene-centric analysis is commonly used to chart the structure, function, and activity of microbial communities in natural and engineered environments. A common approach is to create custom ad hoc reference marker gene sets, but these come with the typical disadvantages of inaccuracy and limited utility beyond assigning query sequences taxonomic labels. The Tree-based Sensitive and Accurate Phylogenetic Profiler (TreeSAPP) software package standardizes analysis of phylogenetic and functional marker genes and improves predictive performance using a classification algorithm that leverages information-rich reference packages consisting of a multiple sequence alignment, a profile hidden Markov model, taxonomic lineage information, and a phylogenetic tree. Here, we provide a set of protocols that link the various analysis modules in TreeSAPP into a coherent process that both informs and directs the user experience. This workflow, initiated from a collection of candidate reference sequences, progresses through construction and refinement of a reference package to marker identification and normalized relative abundance calculations for homologous sequences in metagenomic and metatranscriptomic datasets. The alpha subunit of methyl-coenzyme M reductase (McrA) involved in biological methane cycling is presented as a use case given its dual role as a phylogenetic and functional marker gene driving an ecologically relevant process. These protocols fill several gaps in prior TreeSAPP documentation and provide best practices for reference package construction and refinement, including manual curation steps from trusted sources in support of reproducible gene-centric analysis. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Creating reference packages Support Protocol 1: Installing TreeSAPP Support Protocol 2: Annotating traits within a phylogenetic context Basic Protocol 2: Updating reference packages Basic Protocol 3: Calculating relative abundance of genes in metagenomic and metatranscriptomic datasets.
基因中心分析常用于绘制自然和工程环境中微生物群落的结构、功能和活性图。一种常见的方法是创建自定义的特定参考标记基因集,但这些方法存在典型的缺陷,即准确性不高,而且在为查询序列分配分类标签之外的用途有限。基于树的敏感且准确的系统发育分析器(TreeSAPP)软件包通过使用分类算法来标准化系统发育和功能标记基因的分析,该算法利用由多重序列比对、隐马尔可夫模型、分类线系信息和系统发育树组成的信息丰富的参考包来提高预测性能。在这里,我们提供了一组协议,将 TreeSAPP 中的各种分析模块链接成一个连贯的过程,为用户提供信息并指导用户体验。该工作流程从候选参考序列集开始,通过构建和完善参考包,到鉴定标记基因和计算宏基因组和宏转录组数据集中同源序列的归一化相对丰度,逐步进行。参与生物甲烷循环的甲基辅酶 M 还原酶(McrA)的α亚基被用作一个用例,因为它既是系统发育标记基因,也是驱动生态相关过程的功能标记基因。这些协议填补了之前 TreeSAPP 文档中的几个空白,并提供了参考包构建和完善的最佳实践,包括来自可信来源的手动策展步骤,以支持可重复的基因中心分析。 © 2023 作者。 Wiley 期刊出版公司出版的《当代协议》。 基本方案 1:创建参考包 支持方案 1:安装 TreeSAPP 支持方案 2:在系统发育背景下注释特征 基本方案 2:更新参考包 基本方案 3:计算宏基因组和宏转录组数据集中基因的相对丰度。