Suppr
超能文献

电气石：使用 QIIME 2 和 Snakemake 进行快速可迭代扩增子序列分析的集装箱工作流程。

Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake.

机构信息

Northern Gulf Institute, Mississippi State University, Mississippi State, MS 39762, USA.

Ocean Chemistry and Ecosystems Division, Atlantic Oceanographic and Meteorological Laboratory, National Oceanic and Atmospheric Administration, Miami, FL 33149, USA.

出版信息

Gigascience. 2022 Jul 28;11. doi: 10.1093/gigascience/giac066.

DOI:10.1093/gigascience/giac066

PMID:35902092

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9334028/

Abstract

BACKGROUND

Amplicon sequencing (metabarcoding) is a common method to survey diversity of environmental communities whereby a single genetic locus is amplified and sequenced from the DNA of whole or partial organisms, organismal traces (e.g., skin, mucus, feces), or microbes in an environmental sample. Several software packages exist for analyzing amplicon data, among which QIIME 2 has emerged as a popular option because of its broad functionality, plugin architecture, provenance tracking, and interactive visualizations. However, each new analysis requires the user to keep track of input and output file names, parameters, and commands; this lack of automation and standardization is inefficient and creates barriers to meta-analysis and sharing of results.

FINDINGS

We developed Tourmaline, a Python-based workflow that implements QIIME 2 and is built using the Snakemake workflow management system. Starting from a configuration file that defines parameters and input files-a reference database, a sample metadata file, and a manifest or archive of FASTQ sequences-it uses QIIME 2 to run either the DADA2 or Deblur denoising algorithm; assigns taxonomy to the resulting representative sequences; performs analyses of taxonomic, alpha, and beta diversity; and generates an HTML report summarizing and linking to the output files. Features include support for multiple cores, automatic determination of trimming parameters using quality scores, representative sequence filtering (taxonomy, length, abundance, prevalence, or ID), support for multiple taxonomic classification and sequence alignment methods, outlier detection, and automated initialization of a new analysis using previous settings. The workflow runs natively on Linux and macOS or via a Docker container. We ran Tourmaline on a 16S ribosomal RNA amplicon data set from Lake Erie surface water, showing its utility for parameter optimization and the ability to easily view interactive visualizations through the HTML report, QIIME 2 viewer, and R- and Python-based Jupyter notebooks.

CONCLUSION

Automated workflows like Tourmaline enable rapid analysis of environmental amplicon data, decreasing the time from data generation to actionable results. Tourmaline is available for download at github.com/aomlomics/tourmaline.

摘要

背景

扩增子测序（宏条形码）是一种常见的方法，用于调查环境群落的多样性，通过从整个或部分生物体、生物体痕迹（如皮肤、粘液、粪便）或环境样本中的微生物的 DNA 中扩增和测序单个遗传基因座。有几个软件包可用于分析扩增子数据，其中 QIIME 2 由于其广泛的功能、插件架构、来源跟踪和交互式可视化而成为一个受欢迎的选择。然而，每个新的分析都需要用户跟踪输入和输出文件名、参数和命令；这种缺乏自动化和标准化的方式效率低下，并为元分析和结果共享设置了障碍。

发现

我们开发了 Tourmaline，这是一个基于 Python 的工作流程，实现了 QIIME 2，并使用 Snakemake 工作流程管理系统构建。它从一个定义参数和输入文件的配置文件开始 - 参考数据库、样本元数据文件和 FASTQ 序列的清单或存档 - 它使用 QIIME 2 运行 DADA2 或 Deblur 去噪算法；将分类法分配给产生的代表序列；执行分类、α 和β多样性分析；并生成一个 HTML 报告，总结和链接到输出文件。功能包括支持多个核心、使用质量分数自动确定修剪参数、代表序列过滤（分类、长度、丰度、流行度或 ID）、支持多种分类和序列比对方法、异常值检测以及使用以前的设置自动初始化新的分析。该工作流程在 Linux 和 macOS 上本机运行，或者通过 Docker 容器运行。我们在伊利湖地表水的 16S 核糖体 RNA 扩增子数据集上运行了 Tourmaline，展示了它在参数优化方面的实用性，以及通过 HTML 报告、QIIME 2 查看器和基于 R 和 Python 的 Jupyter 笔记本轻松查看交互式可视化的能力。

结论

像 Tourmaline 这样的自动化工作流程可以加快环境扩增子数据分析的速度，减少从数据生成到可操作结果的时间。Tourmaline 可在 github.com/aomlomics/tourmaline 下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/453a/9334028/014e4ef53fb8/giac066fig1.jpg

相似文献

Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake.

Gigascience. 2022 Jul 28;11. doi: 10.1093/gigascience/giac066.

Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads.

BMC Bioinformatics. 2020 Nov 16;21(1):526. doi: 10.1186/s12859-020-03852-4.

Dadasnake, a Snakemake implementation of DADA2 to process amplicon sequencing data for microbial ecology.

Gigascience. 2020 Nov 30;9(12). doi: 10.1093/gigascience/giaa135.

Dadaist2: A Toolkit to Automate and Simplify Statistical Analysis and Plotting of Metabarcoding Experiments.

Int J Mol Sci. 2021 May 18;22(10):5309. doi: 10.3390/ijms22105309.

StreamingTrim 1.0: a Java software for dynamic trimming of 16S rRNA sequence data from metagenetic studies.

Mol Ecol Resour. 2014 Mar;14(2):426-34. doi: 10.1111/1755-0998.12187. Epub 2013 Nov 16.

A multi-amplicon 16S rRNA sequencing and analysis method for improved taxonomic profiling of bacterial communities.

J Microbiol Methods. 2018 Nov;154:6-13. doi: 10.1016/j.mimet.2018.09.019. Epub 2018 Sep 29.

ILIAD: a suite of automated Snakemake workflows for processing genomic data for downstream applications.

BMC Bioinformatics. 2023 Nov 8;24(1):424. doi: 10.1186/s12859-023-05548-x.

Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin.

Microbiome. 2018 May 17;6(1):90. doi: 10.1186/s40168-018-0470-z.

CoMA - an intuitive and user-friendly pipeline for amplicon-sequencing data analysis.

PLoS One. 2020 Dec 2;15(12):e0243241. doi: 10.1371/journal.pone.0243241. eCollection 2020.

Analysis of 16S rRNA Gene Amplicon Sequences Using the QIIME Software Package.

Methods Mol Biol. 2017;1537:153-163. doi: 10.1007/978-1-4939-6685-1_9.

引用本文的文献

Nasopharyngeal microbiome composition by SARS-CoV-2 presence and severity.

Sci Rep. 2025 Jul 2;15(1):23185. doi: 10.1038/s41598-025-01764-y.

Distinguishing critical microbial community shifts from normal temporal variability in human and environmental ecosystems.

Sci Rep. 2025 May 15;15(1):16934. doi: 10.1038/s41598-025-01781-x.

Ecological and Functional Changes in the Hindgut Microbiome of Holstein Cows at High Altitudes.

Animals (Basel). 2025 Jan 15;15(2):218. doi: 10.3390/ani15020218.

Next-generation IgA-SEQ allows for high-throughput, anaerobic, and metagenomic assessment of IgA-coated bacteria.

Microbiome. 2024 Oct 21;12(1):211. doi: 10.1186/s40168-024-01923-9.

RiboSnake - a user-friendly, robust, reproducible, multipurpose and documentation-extensive pipeline for 16S rRNA gene microbiome analysis.

GigaByte. 2024 Aug 31;2024:gigabyte132. doi: 10.46471/gigabyte.132. eCollection 2024.

Combinatorial characterization of bacterial taxa-driven differences in the microbiome of oyster reefs.

bioRxiv. 2024 May 16:2024.05.15.594453. doi: 10.1101/2024.05.15.594453.

A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses.

Mol Ecol Resour. 2024 Jul;24(5):e13847. doi: 10.1111/1755-0998.13847. Epub 2023 Aug 7.

本文引用的文献

ASAP 2: a pipeline and web server to analyze marker gene amplicon sequencing data automatically and consistently.

BMC Bioinformatics. 2022 Jan 6;23(1):27. doi: 10.1186/s12859-021-04555-0.

EMPress Enables Tree-Guided, Interactive, and Exploratory Analyses of Multi-omic Data Sets.

mSystems. 2021 Mar 16;6(2):e01216-20. doi: 10.1128/mSystems.01216-20.

Microbiome Metadata Standards: Report of the National Microbiome Data Collaborative's Workshop and Follow-On Activities.

mSystems. 2021 Feb 23;6(1):e01194-20. doi: 10.1128/mSystems.01194-20.

Streamlining data-intensive biology with workflow systems.

Gigascience. 2021 Jan 13;10(1). doi: 10.1093/gigascience/giaa140.

CoMA - an intuitive and user-friendly pipeline for amplicon-sequencing data analysis.

PLoS One. 2020 Dec 2;15(12):e0243241. doi: 10.1371/journal.pone.0243241. eCollection 2020.

Dadasnake, a Snakemake implementation of DADA2 to process amplicon sequencing data for microbial ecology.

Gigascience. 2020 Nov 30;9(12). doi: 10.1093/gigascience/giaa135.

Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline.

Front Microbiol. 2020 Oct 23;11:550420. doi: 10.3389/fmicb.2020.550420. eCollection 2020.

PEMA: a flexible Pipeline for Environmental DNA Metabarcoding Analysis of the 16S/18S ribosomal RNA, ITS, and COI marker genes.

Gigascience. 2020 Mar 1;9(3). doi: 10.1093/gigascience/giaa022.

Covariation of diet and gut microbiome in African megafauna.

Proc Natl Acad Sci U S A. 2019 Nov 19;116(47):23588-23593. doi: 10.1073/pnas.1905666116. Epub 2019 Nov 4.

Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2.

Nat Biotechnol. 2019 Aug;37(8):852-857. doi: 10.1038/s41587-019-0209-9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

电气石：使用 QIIME 2 和 Snakemake 进行快速可迭代扩增子序列分析的集装箱工作流程。

Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake.

机构信息

出版信息

BACKGROUND

FINDINGS

CONCLUSION

背景

发现

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译