Mycotoxin Prevention and Applied Microbiology, USDA ARS, Peoria, IL, USA.
Mol Ecol Resour. 2018 May;18(3):541-556. doi: 10.1111/1755-0998.12760. Epub 2018 Feb 17.
Microbial ecology has been profoundly advanced by the ability to profile complex microbial communities by sequencing of marker genes amplified from environmental samples. However, inclusion of appropriate controls is vital to revealing the limitations and biases of this technique. "Mock community" samples, in which the composition and relative abundances of community members are known, are particularly valuable for guiding library preparation and data processing decisions. I generated a set of three mock communities using 19 different fungal taxa and demonstrate their utility by contrasting amplicon sequencing data obtained for the same communities under modifications to PCR conditions during library preparation. Increasing the number of PCR cycles elevated rates of chimera formation, and of errors in the final data set. Extension time during PCR had little impact on chimera formation, error rate or observed community structure. Polymerase fidelity impacted error rates significantly. Despite a high error rate, a master mix optimized to minimize amplification bias yielded profiles that were most similar to the true community structure. Bias against particular taxa differed among ITS1 vs. ITS2 loci. Preclustering nearly identical reads substantially reduced error rates, but did not improve similarity to the expected community structure. Inaccuracies in amplicon sequence-based estimates of fungal community structure were associated with amplification bias and size selection processes, as well as variable culling rates among reads from different taxa. In some cases, the numerically dominant taxon was completely absent from final data sets, highlighting the need for further methodological improvements to avoid biased observations of community profiles.
微生物生态学通过对环境样本中扩增的标记基因进行测序来分析复杂微生物群落的能力得到了极大的推进。然而,包含适当的对照对于揭示该技术的局限性和偏差至关重要。“模拟群落”样本中,群落成员的组成和相对丰度是已知的,对于指导文库制备和数据处理决策特别有价值。我使用 19 种不同的真菌类群生成了一组三个模拟群落,并通过对比在文库制备过程中 PCR 条件改变时获得的相同群落的扩增子测序数据来证明其有用性。增加 PCR 循环数会提高嵌合体形成的速度,并增加最终数据集的错误率。PCR 过程中的延伸时间对嵌合体形成、错误率或观察到的群落结构几乎没有影响。聚合酶保真度对错误率有显著影响。尽管错误率很高,但优化以最小化扩增偏差的主混合物产生的图谱与真实群落结构最相似。偏倚于特定类群在 ITS1 与 ITS2 基因座之间有所不同。近乎相同的读取预聚类可显著降低错误率,但不能提高与预期群落结构的相似性。基于扩增子序列的真菌群落结构估计的不准确与扩增偏差、大小选择过程以及不同类群的读取之间的可变剔除率有关。在某些情况下,最终数据集中完全不存在数量上占主导地位的类群,这凸显了需要进一步改进方法以避免对群落图谱的有偏差观察。