Leray Matthieu, Knowlton Nancy
National Museum of Natural History, Smithsonian Institution, Washington, D.C., USA; Smithsonian Tropical Research Institute, Smithsonian Institution, Panama City, Balboa, Ancon, Republic of Panama.
National Museum of Natural History, Smithsonian Institution , Washington , D.C. , USA.
PeerJ. 2017 Mar 22;5:e3006. doi: 10.7717/peerj.3006. eCollection 2017.
DNA metabarcoding, the PCR-based profiling of natural communities, is becoming the method of choice for biodiversity monitoring because it circumvents some of the limitations inherent to traditional ecological surveys. However, potential sources of bias that can affect the reproducibility of this method remain to be quantified. The interpretation of differences in patterns of sequence abundance and the ecological relevance of rare sequences remain particularly uncertain. Here we used one artificial mock community to explore the significance of abundance patterns and disentangle the effects of two potential biases on data reproducibility: indexed PCR primers and random sampling during Illumina MiSeq sequencing. We amplified a short fragment of the mitochondrial Cytochrome c Oxidase Subunit I (COI) for a single mock sample containing equimolar amounts of total genomic DNA from 34 marine invertebrates belonging to six phyla. We used seven indexed broad-range primers and sequenced the resulting library on two consecutive Illumina MiSeq runs. The total number of Operational Taxonomic Units (OTUs) was ∼4 times higher than expected based on the composition of the mock sample. Moreover, the total number of reads for the 34 components of the mock sample differed by up to three orders of magnitude. However, 79 out of 86 of the unexpected OTUs were represented by <10 sequences that did not appear consistently across replicates. Our data suggest that random sampling of rare OTUs (e.g., small associated fauna such as parasites) accounted for most of variation in OTU presence-absence, whereas biases associated with indexed PCRs accounted for a larger amount of variation in relative abundance patterns. These results suggest that random sampling during sequencing leads to the low reproducibility of rare OTUs. We suggest that the strategy for handling rare OTUs should depend on the objectives of the study. Systematic removal of rare OTUs may avoid inflating diversity based on common descriptors but will exclude positive records of taxa that are functionally important. Our results further reinforce the need for technical replicates (parallel PCR and sequencing from the same sample) in metabarcoding experimental designs. Data reproducibility should be determined empirically as it will depend upon the sequencing depth, the type of sample, the sequence analysis pipeline, and the number of replicates. Moreover, estimating relative biomasses or abundances based on read counts remains elusive at the OTU level.
DNA 宏条形码技术,即基于聚合酶链式反应(PCR)对自然群落进行特征分析,正成为生物多样性监测的首选方法,因为它规避了传统生态调查中固有的一些局限性。然而,可能影响该方法可重复性的潜在偏差来源仍有待量化。序列丰度模式差异的解读以及稀有序列的生态相关性仍然特别不确定。在这里,我们使用一个人工模拟群落来探讨丰度模式的重要性,并剖析两种潜在偏差对数据可重复性的影响:索引 PCR 引物和 Illumina MiSeq 测序过程中的随机抽样。我们针对一个模拟样本扩增了线粒体细胞色素 c 氧化酶亚基 I(COI)的短片段,该样本包含来自六个门的 34 种海洋无脊椎动物等摩尔量的总基因组 DNA。我们使用了七种索引宽范围引物,并在连续两次 Illumina MiSeq 运行中对所得文库进行测序。操作分类单元(OTU)的总数比基于模拟样本组成预期的高出约 4 倍。此外,模拟样本的 34 个组分的读取总数相差高达三个数量级。然而,86 个意外 OTU 中的 79 个由少于 10 条序列代表,这些序列在重复样本中并非始终出现。我们的数据表明,稀有 OTU(例如寄生虫等小型伴生动物群)的随机抽样占 OTU 存在与否变化的大部分,而与索引 PCR 相关的偏差在相对丰度模式变化中占比更大。这些结果表明,测序过程中的随机抽样导致稀有 OTU 的可重复性较低。我们建议处理稀有 OTU 的策略应取决于研究目的。系统去除稀有 OTU 可能避免基于常见描述符夸大多样性,但会排除功能上重要的分类群的阳性记录。我们的结果进一步强调了在宏条形码实验设计中进行技术重复(从同一样本进行平行 PCR 和测序)的必要性。数据可重复性应根据经验确定,因为它将取决于测序深度、样本类型、序列分析流程和重复次数。此外,在 OTU 水平上基于读取计数估计相对生物量或丰度仍然难以实现。