Department of Plant Physiology, Genetics and Biotechnology, University of Warmia and Mazury in Olsztyn, 10-719 Olsztyn, Poland.
Department of Environmental Biotechnology, University of Warmia and Mazury in Olsztyn, 11-709 Olsztyn, Poland.
Int J Mol Sci. 2024 Mar 20;25(6):3508. doi: 10.3390/ijms25063508.
Taxonomic classification using metabarcoding is a commonly used method in microbiological studies of environmental samples and during monitoring of biotechnological processes. However, it is difficult to compare results from different laboratories, due to the variety of bioinformatics tools that have been developed and used for data analysis. This problem is compounded by different choices regarding which variable region of the gene and which database is used for taxonomic identification. Therefore, this study employed the DADA2 algorithm to optimize the preprocessing of raw data obtained from the sequencing of activated sludge samples, using simultaneous analysis of three frequently used regions of (V1-V3, V3-V4, V4-V5). Additionally, the study evaluated which variable region and which of the frequently used microbial databases for taxonomic classification (Greengenes2, Silva, RefSeq) more accurately classify OTUs into taxa. Adjusting the values of selected parameters of the DADA2 algorithm, we obtained the highest possible numbers of OTUs for each region. Regarding biodiversity within regions, the V3-V4 region had the highest Simpson and Shannon indexes, and the Chao1 index was similar to that of the V1-V3 region. Beta-biodiversity analysis revealed statistically significant differences between regions. When comparing databases for each of the regions studied, the highest numbers of taxonomic groups were obtained using the SILVA database. These results suggest that standardization of metabarcoding of short amplicons may be possible.
基于代谢组学的分类学分类是环境样本微生物学研究和生物技术过程监测中常用的方法。然而,由于已经开发并用于数据分析的生物信息学工具种类繁多,因此很难比较来自不同实验室的结果。这个问题因用于分类鉴定的基因的不同可变区和数据库的选择而更加复杂。因此,本研究采用 DADA2 算法来优化活性污泥样品测序获得的原始数据的预处理,同时分析 (V1-V3、V3-V4、V4-V5) 三个常用区域。此外,该研究评估了哪个可变区和哪些常用于分类的微生物数据库(Greengenes2、Silva、RefSeq)能更准确地将 OTU 分类为分类单元。通过调整 DADA2 算法的选定参数的值,我们为每个区域获得了尽可能多的 OTU。关于区域内的生物多样性,V3-V4 区域具有最高的 Simpson 和 Shannon 指数,而 Chao1 指数与 V1-V3 区域相似。β-生物多样性分析显示区域之间存在统计学上的显著差异。当比较每个研究区域的数据库时,使用 SILVA 数据库获得了最多的分类群数量。这些结果表明,短扩增子代谢组学的标准化可能是可行的。