Northern Gulf Institute, Mississippi State University, Mississippi State, MS 39762, USA.
Ocean Chemistry and Ecosystems Division, Atlantic Oceanographic and Meteorological Laboratory, National Oceanic and Atmospheric Administration, Miami, FL 33149, USA.
Gigascience. 2022 Jul 28;11. doi: 10.1093/gigascience/giac066.
Amplicon sequencing (metabarcoding) is a common method to survey diversity of environmental communities whereby a single genetic locus is amplified and sequenced from the DNA of whole or partial organisms, organismal traces (e.g., skin, mucus, feces), or microbes in an environmental sample. Several software packages exist for analyzing amplicon data, among which QIIME 2 has emerged as a popular option because of its broad functionality, plugin architecture, provenance tracking, and interactive visualizations. However, each new analysis requires the user to keep track of input and output file names, parameters, and commands; this lack of automation and standardization is inefficient and creates barriers to meta-analysis and sharing of results.
We developed Tourmaline, a Python-based workflow that implements QIIME 2 and is built using the Snakemake workflow management system. Starting from a configuration file that defines parameters and input files-a reference database, a sample metadata file, and a manifest or archive of FASTQ sequences-it uses QIIME 2 to run either the DADA2 or Deblur denoising algorithm; assigns taxonomy to the resulting representative sequences; performs analyses of taxonomic, alpha, and beta diversity; and generates an HTML report summarizing and linking to the output files. Features include support for multiple cores, automatic determination of trimming parameters using quality scores, representative sequence filtering (taxonomy, length, abundance, prevalence, or ID), support for multiple taxonomic classification and sequence alignment methods, outlier detection, and automated initialization of a new analysis using previous settings. The workflow runs natively on Linux and macOS or via a Docker container. We ran Tourmaline on a 16S ribosomal RNA amplicon data set from Lake Erie surface water, showing its utility for parameter optimization and the ability to easily view interactive visualizations through the HTML report, QIIME 2 viewer, and R- and Python-based Jupyter notebooks.
Automated workflows like Tourmaline enable rapid analysis of environmental amplicon data, decreasing the time from data generation to actionable results. Tourmaline is available for download at github.com/aomlomics/tourmaline.
扩增子测序(宏条形码)是一种常见的方法,用于调查环境群落的多样性,通过从整个或部分生物体、生物体痕迹(如皮肤、粘液、粪便)或环境样本中的微生物的 DNA 中扩增和测序单个遗传基因座。有几个软件包可用于分析扩增子数据,其中 QIIME 2 由于其广泛的功能、插件架构、来源跟踪和交互式可视化而成为一个受欢迎的选择。然而,每个新的分析都需要用户跟踪输入和输出文件名、参数和命令;这种缺乏自动化和标准化的方式效率低下,并为元分析和结果共享设置了障碍。
我们开发了 Tourmaline,这是一个基于 Python 的工作流程,实现了 QIIME 2,并使用 Snakemake 工作流程管理系统构建。它从一个定义参数和输入文件的配置文件开始 - 参考数据库、样本元数据文件和 FASTQ 序列的清单或存档 - 它使用 QIIME 2 运行 DADA2 或 Deblur 去噪算法;将分类法分配给产生的代表序列;执行分类、α 和β多样性分析;并生成一个 HTML 报告,总结和链接到输出文件。功能包括支持多个核心、使用质量分数自动确定修剪参数、代表序列过滤(分类、长度、丰度、流行度或 ID)、支持多种分类和序列比对方法、异常值检测以及使用以前的设置自动初始化新的分析。该工作流程在 Linux 和 macOS 上本机运行,或者通过 Docker 容器运行。我们在伊利湖地表水的 16S 核糖体 RNA 扩增子数据集上运行了 Tourmaline,展示了它在参数优化方面的实用性,以及通过 HTML 报告、QIIME 2 查看器和基于 R 和 Python 的 Jupyter 笔记本轻松查看交互式可视化的能力。
像 Tourmaline 这样的自动化工作流程可以加快环境扩增子数据分析的速度,减少从数据生成到可操作结果的时间。Tourmaline 可在 github.com/aomlomics/tourmaline 下载。