Sinclair Lucas, Osman Omneya Ahmed, Bertilsson Stefan, Eiler Alexander
Department of Ecology and Genetics, Limnology, and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
PLoS One. 2015 Feb 3;10(2):e0116955. doi: 10.1371/journal.pone.0116955. eCollection 2015.
As new sequencing technologies become cheaper and older ones disappear, laboratories switch vendors and platforms. Validating the new setups is a crucial part of conducting rigorous scientific research. Here we report on the reliability and biases of performing bacterial 16S rRNA gene amplicon paired-end sequencing on the MiSeq Illumina platform. We designed a protocol using 50 barcode pairs to run samples in parallel and coded a pipeline to process the data. Sequencing the same sediment sample in 248 replicates as well as 70 samples from alkaline soda lakes, we evaluated the performance of the method with regards to estimates of alpha and beta diversity. Using different purification and DNA quantification procedures we always found up to 5-fold differences in the yield of sequences between individually barcodes samples. Using either a one-step or a two-step PCR preparation resulted in significantly different estimates in both alpha and beta diversity. Comparing with a previous method based on 454 pyrosequencing, we found that our Illumina protocol performed in a similar manner - with the exception for evenness estimates where correspondence between the methods was low. We further quantified the data loss at every processing step eventually accumulating to 50% of the raw reads. When evaluating different OTU clustering methods, we observed a stark contrast between the results of QIIME with default settings and the more recent UPARSE algorithm when it comes to the number of OTUs generated. Still, overall trends in alpha and beta diversity corresponded highly using both clustering methods. Our procedure performed well considering the precisions of alpha and beta diversity estimates, with insignificant effects of individual barcodes. Comparative analyses suggest that 454 and Illumina sequence data can be combined if the same PCR protocol and bioinformatic workflows are used for describing patterns in richness, beta-diversity and taxonomic composition.
随着新的测序技术成本降低,旧技术逐渐淘汰,实验室纷纷更换供应商和平台。验证新的设置是进行严谨科学研究的关键部分。在此,我们报告了在Illumina MiSeq平台上进行细菌16S rRNA基因扩增子双端测序的可靠性和偏差。我们设计了一种使用50对条形码并行运行样本的方案,并编写了一个处理数据的流程。对同一沉积物样本进行248次重复测序以及对来自碱性苏打湖的70个样本进行测序后,我们评估了该方法在α和β多样性估计方面的性能。使用不同的纯化和DNA定量程序,我们始终发现单个条形码样本之间的序列产量存在高达5倍的差异。使用一步法或两步法PCR制备在α和β多样性估计中均产生了显著差异。与之前基于454焦磷酸测序的方法相比,我们发现我们的Illumina方案表现类似——除了均匀度估计,两种方法之间的一致性较低。我们进一步量化了每个处理步骤的数据损失,最终累积达到原始读数的50%。在评估不同的OTU聚类方法时,我们观察到在生成的OTU数量方面,默认设置的QIIME结果与更新的UPARSE算法之间存在鲜明对比。不过,使用这两种聚类方法时,α和β多样性的总体趋势高度一致。考虑到α和β多样性估计的精度,我们的程序表现良好,单个条形码的影响不显著。比较分析表明,如果使用相同的PCR方案和生物信息学工作流程来描述丰富度、β多样性和分类组成的模式,则可以将454和Illumina序列数据结合起来。