Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway.
Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway.
Microbiome. 2017 Jul 6;5(1):68. doi: 10.1186/s40168-017-0279-1.
Advances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine. Nonetheless, the need remains to improve on the techniques used for gathering such data, including increasing throughput while lowering cost and benchmarking the techniques so that potential sources of bias can be better characterized.
We present a triple-index amplicon sequencing strategy to sequence large numbers of samples at significantly lower c ost and in a shorter timeframe compared to existing methods. The design employs a two-stage PCR protocol, incorpo rating three barcodes to each sample, with the possibility to add a fourth-index. It also includes heterogeneity spacers to overcome low complexity issues faced when sequencing amplicons on Illumina platforms.
The library preparation method was extensively benchmarked through analysis of a mock community in order to assess biases introduced by sample indexing, number of PCR cycles, and template concentration. We further evaluated the method through re-sequencing of a standardized environmental sample. Finally, we evaluated our protocol on a set of fecal samples from a small cohort of healthy adults, demonstrating good performance in a realistic experimental setting. Between-sample variation was mainly related to batch effects, such as DNA extraction, while sample indexing was also a significant source of bias. PCR cycle number strongly influenced chimera formation and affected relative abundance estimates of species with high GC content. Libraries were sequenced using the Illumina HiSeq and MiSeq platforms to demonstrate that this protocol is highly scalable to sequence thousands of samples at a very low cost.
Here, we provide the most comprehensive study of performance and bias inherent to a 16S rRNA gene amplicon sequencing method to date. Triple-indexing greatly reduces the number of long custom DNA oligos required for library preparation, while the inclusion of variable length heterogeneity spacers minimizes the need for PhiX spike-in. This design results in a significant cost reduction of highly multiplexed amplicon sequencing. The biases we characterize highlight the need for highly standardized protocols. Reassuringly, we find that the biological signal is a far stronger structuring factor than the various sources of bias.
测序技术和生物信息学的进步使得微生物群落的分析几乎成为常规操作。尽管如此,仍然需要改进用于收集此类数据的技术,包括提高通量,降低成本,并对技术进行基准测试,以便更好地描述潜在的偏差来源。
我们提出了一种三重索引扩增子测序策略,与现有方法相比,该策略可以以更低的成本和更短的时间间隔对大量样本进行测序。该设计采用两步 PCR 方案,在每个样本中加入三个条形码,并可以添加第四个索引。它还包括异质间隔子,以克服在 Illumina 平台上对扩增子进行测序时面临的低复杂度问题。
通过对模拟群落进行分析,对文库制备方法进行了广泛的基准测试,以评估样品索引、PCR 循环数和模板浓度引入的偏差。我们通过对标准化环境样本的重新测序进一步评估了该方法。最后,我们在一小部分健康成年人的粪便样本集上评估了我们的方案,在现实实验环境中证明了良好的性能。样品间的变异性主要与 DNA 提取等批次效应有关,而样品索引也是一个显著的偏差来源。PCR 循环数强烈影响嵌合体的形成,并影响高 GC 含量物种的相对丰度估计。使用 Illumina HiSeq 和 MiSeq 平台对文库进行测序,证明该方案可以非常经济高效地对数千个样本进行测序。
在这里,我们提供了迄今为止对 16S rRNA 基因扩增子测序方法固有性能和偏差的最全面研究。三重索引极大地减少了文库制备所需的长定制 DNA 寡核苷酸的数量,而包含可变长度异质间隔子最小化了需要添加 PhiX Spike-in。这种设计导致高度多重扩增子测序的成本显著降低。我们所描述的偏差突出了对高度标准化方案的需求。令人欣慰的是,我们发现生物信号是比各种偏差来源更强的结构因素。