Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich NR4 7UQ, UK.
Medical Sciences Department, University of Turin, 10126 Turin, Italy.
Int J Mol Sci. 2021 May 18;22(10):5309. doi: 10.3390/ijms22105309.
The taxonomic composition of microbial communities can be assessed using universal marker amplicon sequencing. The most common taxonomic markers are the 16S rDNA for bacterial communities and the internal transcribed spacer (ITS) region for fungal communities, but various other markers are used for barcoding eukaryotes. A crucial step in the bioinformatic analysis of amplicon sequences is the identification of representative sequences. This can be achieved using a clustering approach or by denoising raw sequencing reads. DADA2 is a widely adopted algorithm, released as an R library, that denoises marker-specific amplicons from next-generation sequencing and produces a set of representative sequences referred to as 'Amplicon Sequence Variants' (ASV). Here, we present Dadaist2, a modular pipeline, providing a complete suite for the analysis that ranges from raw sequencing reads to the statistics of numerical ecology. Dadaist2 implements a new approach that is specifically optimised for amplicons with variable lengths, such as the fungal ITS. The pipeline focuses on streamlining the data flow from the command line to R, with multiple options for statistical analysis and plotting, both interactive and automatic.
可以使用通用标记扩增子测序来评估微生物群落的分类组成。细菌群落最常用的分类标记是 16S rDNA,真菌群落最常用的分类标记是内部转录间隔区(ITS),但也有各种其他标记用于真核生物的条形码。扩增子序列生物信息学分析的关键步骤是鉴定代表序列。这可以通过聚类方法或对原始测序reads 进行去噪来实现。DADA2 是一种广泛采用的算法,作为 R 库发布,它可以从下一代测序中去除标记特异性扩增子,并产生一组称为“扩增子序列变体”(ASV)的代表序列。在这里,我们介绍了 Dadaist2,这是一个模块化的管道,提供了从原始测序reads 到数值生态学统计的完整分析套件。Dadaist2 实现了一种新方法,专门针对具有可变长度的扩增子(如真菌 ITS)进行了优化。该管道专注于简化从命令行到 R 的数据流,提供了多种统计分析和绘图选项,包括交互式和自动选项。