Joos Lisa, Beirinckx Stien, Haegeman Annelies, Debode Jane, Vandecasteele Bart, Baeyen Steve, Goormachtig Sofie, Clement Lieven, De Tender Caroline
Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Plant Sciences Unit, Burgemeester Van Gansberghelaan 92, 9820, Merelbeke, Belgium.
Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Krijgslaan 281, 9000, Ghent, Belgium.
BMC Genomics. 2020 Oct 22;21(1):733. doi: 10.1186/s12864-020-07126-4.
Microorganisms are not only indispensable to ecosystem functioning, they are also keystones for emerging technologies. In the last 15 years, the number of studies on environmental microbial communities has increased exponentially due to advances in sequencing technologies, but the large amount of data generated remains difficult to analyze and interpret. Recently, metabarcoding analysis has shifted from clustering reads using Operational Taxonomical Units (OTUs) to Amplicon Sequence Variants (ASVs). Differences between these methods can seriously affect the biological interpretation of metabarcoding data, especially in ecosystems with high microbial diversity, as the methods are benchmarked based on low diversity datasets.
In this work we have thoroughly examined the differences in community diversity, structure, and complexity between the OTU and ASV methods. We have examined culture-based mock and simulated datasets as well as soil- and plant-associated bacterial and fungal environmental communities. Four key findings were revealed. First, analysis of microbial datasets at family level guaranteed both consistency and adequate coverage when using either method. Second, the performance of both methods used are related to community diversity and sample sequencing depth. Third, differences in the method used affected sample diversity and number of detected differentially abundant families upon treatment; this may lead researchers to draw different biological conclusions. Fourth, the observed differences can mostly be attributed to low abundant (relative abundance < 0.1%) families, thus extra care is recommended when studying rare species using metabarcoding. The ASV method used outperformed the adopted OTU method concerning community diversity, especially for fungus-related sequences, but only when the sequencing depth was sufficient to capture the community complexity.
Investigation of metabarcoding data should be done with care. Correct biological interpretation depends on several factors, including in-depth sequencing of the samples, choice of the most appropriate filtering strategy for the specific research goal, and use of family level for data clustering.
微生物不仅对生态系统功能不可或缺,也是新兴技术的关键要素。在过去15年中,由于测序技术的进步,关于环境微生物群落的研究数量呈指数级增长,但所产生的大量数据仍难以分析和解读。最近,元条形码分析已从使用操作分类单元(OTU)对读数进行聚类转变为使用扩增子序列变体(ASV)。这些方法之间的差异会严重影响元条形码数据的生物学解释,尤其是在微生物多样性高的生态系统中,因为这些方法是基于低多样性数据集进行基准测试的。
在这项工作中,我们全面研究了OTU和ASV方法在群落多样性、结构和复杂性方面的差异。我们研究了基于培养的模拟数据集以及与土壤和植物相关的细菌和真菌环境群落。揭示了四个关键发现。第一,在科级水平分析微生物数据集时,使用任何一种方法都能保证一致性和足够的覆盖范围。第二,所使用的两种方法的性能都与群落多样性和样本测序深度有关。第三,所使用方法的差异会影响处理后样本的多样性和检测到的差异丰富科的数量;这可能导致研究人员得出不同的生物学结论。第四,观察到的差异大多可归因于低丰度(相对丰度<0.1%)的科,因此在使用元条形码研究稀有物种时建议格外小心。在所研究的群落多样性方面,所使用的ASV方法优于采用的OTU方法,尤其是对于与真菌相关的序列,但前提是测序深度足以捕捉群落复杂性。
对元条形码数据的研究应谨慎进行。正确的生物学解释取决于几个因素,包括样本的深度测序、针对特定研究目标选择最合适的过滤策略以及使用科级水平进行数据聚类。