Palmer Jonathan M, Jusino Michelle A, Banik Mark T, Lindner Daniel L
Center for Forest Mycology Research, Northern Research Station, USDA Forest Service, Madison, WI, USA.
PeerJ. 2018 May 28;6:e4925. doi: 10.7717/peerj.4925. eCollection 2018.
High-throughput amplicon sequencing (HTAS) of conserved DNA regions is a powerful technique to characterize microbial communities. Recently, spike-in mock communities have been used to measure accuracy of sequencing platforms and data analysis pipelines. To assess the ability of sequencing platforms and data processing pipelines using fungal internal transcribed spacer (ITS) amplicons, we created two ITS spike-in control mock communities composed of cloned DNA in plasmids: a biological mock community, consisting of ITS sequences from fungal taxa, and a synthetic mock community (SynMock), consisting of non-biological ITS-like sequences. Using these spike-in controls we show that: (1) a non-biological synthetic control (e.g., SynMock) is the best solution for parameterizing bioinformatics pipelines, (2) pre-clustering steps for variable length amplicons are critically important, (3) a major source of bias is attributed to the initial polymerase chain reaction (PCR) and thus HTAS read abundances are typically not representative of starting values. We developed AMPtk, a versatile software solution equipped to deal with variable length amplicons and quality filter HTAS data based on spike-in controls. While we describe herein a non-biological SynMock community for ITS sequences, the concept and AMPtk software can be widely applied to any HTAS dataset to improve data quality.
对保守DNA区域进行高通量扩增子测序(HTAS)是表征微生物群落的一项强大技术。最近,加入内标的模拟群落已被用于评估测序平台和数据分析流程的准确性。为了评估使用真菌内转录间隔区(ITS)扩增子的测序平台和数据处理流程的能力,我们构建了两个由质粒中的克隆DNA组成的ITS内标对照模拟群落:一个生物模拟群落,由真菌类群的ITS序列组成;一个合成模拟群落(SynMock),由非生物的ITS样序列组成。使用这些内标对照,我们发现:(1)非生物合成对照(如SynMock)是对生物信息学流程进行参数化的最佳解决方案;(2)对可变长度扩增子进行预聚类步骤至关重要;(3)偏差的一个主要来源归因于初始聚合酶链反应(PCR),因此HTAS读取丰度通常不代表起始值。我们开发了AMPtk,这是一个通用的软件解决方案,能够处理可变长度扩增子并基于内标对照对HTAS数据进行质量过滤。虽然我们在此描述了一个针对ITS序列的非生物SynMock群落,但该概念和AMPtk软件可广泛应用于任何HTAS数据集以提高数据质量。