Helmholtz Centre for Environmental Research GmbH - UFZ, Department of Soil Ecology; Theodor-Lieser-Str. 4, 06120 Halle, Germany.
German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Metagenomics Support Unit; Puschstr. 4, 04103 Leipzig, Germany.
Gigascience. 2020 Nov 30;9(12). doi: 10.1093/gigascience/giaa135.
Amplicon sequencing of phylogenetic marker genes, e.g., 16S, 18S, or ITS ribosomal RNA sequences, is still the most commonly used method to determine the composition of microbial communities. Microbial ecologists often have expert knowledge on their biological question and data analysis in general, and most research institutes have computational infrastructures to use the bioinformatics command line tools and workflows for amplicon sequencing analysis, but requirements of bioinformatics skills often limit the efficient and up-to-date use of computational resources.
We present dadasnake, a user-friendly, 1-command Snakemake pipeline that wraps the preprocessing of sequencing reads and the delineation of exact sequence variants by using the favorably benchmarked and widely used DADA2 algorithm with a taxonomic classification and the post-processing of the resultant tables, including hand-off in standard formats. The suitability of the provided default configurations is demonstrated using mock community data from bacteria and archaea, as well as fungi.
By use of Snakemake, dadasnake makes efficient use of high-performance computing infrastructures. Easy user configuration guarantees flexibility of all steps, including the processing of data from multiple sequencing platforms. It is easy to install dadasnake via conda environments. dadasnake is available at https://github.com/a-h-b/dadasnake.
扩增子测序的系统发育标记基因,例如 16S、18S 或 ITS 核糖体 RNA 序列,仍然是确定微生物群落组成的最常用方法。微生物生态学家通常对他们的生物学问题和数据分析有专业知识,并且大多数研究机构都有计算基础设施,可以使用生物信息学命令行工具和工作流程进行扩增子测序分析,但对生物信息学技能的要求通常限制了计算资源的有效和最新利用。
我们提出了 dadasnake,这是一个用户友好的、单命令 Snakemake 管道,它使用经过有利基准测试和广泛使用的 DADA2 算法来预处理测序reads,并通过分类学分类和对结果表进行后处理(包括以标准格式进行交接)来划定精确的序列变体。使用来自细菌和古菌以及真菌的模拟群落数据证明了所提供的默认配置的适用性。
通过使用 Snakemake,dadasnake 可以有效地利用高性能计算基础设施。易于用户配置可确保所有步骤的灵活性,包括来自多个测序平台的数据处理。通过 conda 环境可以轻松安装 dadasnake。dadasnake 可在 https://github.com/a-h-b/dadasnake 上获得。