Abdala Asbun Alejandro, Besseling Marc A, Balzano Sergio, van Bleijswijk Judith D L, Witte Harry J, Villanueva Laura, Engelmann Julia C
Department of Marine Microbiology and Biogeochemistry, NIOZ Royal Netherlands Institute for Sea Research, Texel, Netherlands.
Department of Earth Sciences, Faculty of Geosciences, Utrecht University, Utrecht, Netherlands.
Front Genet. 2020 Nov 20;11:489357. doi: 10.3389/fgene.2020.489357. eCollection 2020.
Marker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms environmental DNA sequencing. Since this technique is based on sequencing a single gene, or even only parts of a single gene rather than the entire genome, the number of reads needed per sample to assess the microbial community structure is lower than that required for metagenome sequencing. This makes marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed , a scalable, flexible, and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. takes the raw data as input and delivers a table of operational taxonomic units (OTUs) or Amplicon Sequence Variants (ASVs) in BIOM and text format and representative sequences. is a highly versatile software that allows users to customize several steps of the pipeline, such as selecting from a set of OTU clustering methods or performing ASV analysis. In addition, we designed to run in any linux/unix computing environment from desktop computers to computing servers making use of parallel processing if possible. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. is freely available at Github: https://github.com/AlejandroAb/CASCABEL.
对rRNA操纵子(16S、18S、ITS)或细胞色素c氧化酶I(CO1)进行标记基因测序,是评估环境微生物群落、与动植物相关的微生物组以及多细胞生物群落环境DNA测序的常用方法。由于该技术基于对单个基因甚至单个基因的部分而非整个基因组进行测序,因此评估微生物群落结构所需的每个样本读取数低于宏基因组测序所需的读取数。这使得几乎任何实验室都能负担得起标记基因测序。尽管数据生成相对容易且具有成本效益,但分析所得的序列数据需要的计算技能可能超出当前分子生物学家/生态学家的标准技能范围。我们开发了一种可扩展、灵活且易于使用的扩增子序列数据分析流程,该流程在其计算步骤中使用Snakemake以及现有和新开发的解决方案的组合。该流程以原始数据为输入,并以BIOM和文本格式以及代表性序列提供操作分类单元(OTU)或扩增子序列变体(ASV)的表格。这是一款高度通用的软件,允许用户自定义流程的多个步骤,例如从一组OTU聚类方法中进行选择或执行ASV分析。此外,我们将其设计为可在从台式计算机到计算服务器的任何Linux/Unix计算环境中运行,并尽可能利用并行处理。分析和结果完全可重现,并记录在HTML和可选的pdf报告中。该软件可在Github上免费获取:https://github.com/AlejandroAb/CASCABEL。