Koskinen Kaisa, Auvinen Petri, Björkroth K Johanna, Hultman Jenni
1 Institute of Biotechnology, University of Helsinki , Helsinki, Finland .
2 Department of Food Hygiene and Environmental Health, Faculty of Veterinary Medicine, University of Helsinki , Helsinki, Finland .
J Comput Biol. 2015 Aug;22(8):743-51. doi: 10.1089/cmb.2014.0268. Epub 2014 Dec 19.
Natural microbial communities have been studied for decades using the 16S rRNA gene as a marker. In recent years, the application of second-generation sequencing technologies has revolutionized our understanding of the structure and function of microbial communities in complex environments. Using these highly parallel techniques, a detailed description of community characteristics are constructed, and even the rare biosphere can be detected. The new approaches carry numerous advantages and lack many features that skewed the results using traditional techniques, but we are still facing serious bias, and the lack of reliable comparability of produced results. Here, we contrasted publicly available amplicon sequence data analysis algorithms by using two different data sets, one with defined clone-based structure, and one with food spoilage community with well-studied communities. We aimed to assess which software and parameters produce results that resemble the benchmark community best, how large differences can be detected between methods, and whether these differences are statistically significant. The results suggest that commonly accepted denoising and clustering methods used in different combinations produce significantly different outcome: clustering method impacts greatly on the number of operational taxonomic units (OTUs) and denoising algorithm influences more on taxonomic affiliations. The magnitude of the OTU number difference was up to 40-fold and the disparity between results seemed highly dependent on the community structure and diversity. Statistically significant differences in taxonomies between methods were seen even at phylum level. However, the application of effective denoising method seemed to even out the differences produced by clustering.
几十年来,人们一直以16S rRNA基因作为标记来研究自然微生物群落。近年来,第二代测序技术的应用彻底改变了我们对复杂环境中微生物群落结构和功能的理解。利用这些高度并行的技术,可以构建出群落特征的详细描述,甚至能够检测到稀有生物圈。这些新方法具有许多优点,并且没有传统技术中那些会使结果产生偏差的诸多特征,但我们仍然面临着严重的偏差问题,以及所产生结果缺乏可靠可比性的问题。在这里,我们通过使用两个不同的数据集对比了公开可用的扩增子序列数据分析算法,一个数据集具有基于克隆的明确结构,另一个数据集是具有充分研究的食品腐败群落。我们旨在评估哪些软件和参数能产生最接近基准群落的结果,不同方法之间能检测到多大的差异,以及这些差异是否具有统计学意义。结果表明,以不同组合使用的普遍接受的去噪和聚类方法会产生显著不同的结果:聚类方法对操作分类单元(OTU)的数量影响很大,而去噪算法对分类归属的影响更大。OTU数量差异的幅度高达40倍,结果之间的差异似乎高度依赖于群落结构和多样性。即使在门水平上,不同方法之间在分类学上也存在统计学显著差异。然而,有效去噪方法的应用似乎消除了聚类所产生的差异。