Commonwealth Scientific and Industrial Research Organisation (CSIRO) Ecosystem Sciences, Canberra, Australian Capital Territory, Australia.
PLoS One. 2013 Aug 26;8(8):e71974. doi: 10.1371/journal.pone.0071974. eCollection 2013.
Accurate estimation of biological diversity in environmental DNA samples using high-throughput amplicon pyrosequencing must account for errors generated by PCR and sequencing. We describe a novel approach to distinguish the underlying sequence diversity in environmental DNA samples from errors that uses information on the abundance distribution of similar sequences across independent samples, as well as the frequency and diversity of sequences within individual samples. We have further refined this approach into a bioinformatics pipeline, Amplicon Pyrosequence Denoising Program (APDP) that is able to process raw sequence datasets into a set of validated sequences in formats compatible with commonly used downstream analyses packages. We demonstrate, by sequencing complex environmental samples and mock communities, that APDP is effective for removing errors from deeply sequenced datasets comprising biological and technical replicates, and can efficiently denoise single-sample datasets. APDP provides more conservative diversity estimates for complex datasets than other approaches; however, for some applications this may provide a more accurate and appropriate level of resolution, and result in greater confidence that returned sequences reflect the diversity of the underlying sample.
利用高通量扩增子焦磷酸测序技术准确估算环境 DNA 样本中的生物多样性,必须考虑到 PCR 和测序产生的误差。我们描述了一种从误差中区分环境 DNA 样本中潜在序列多样性的新方法,该方法利用了独立样本中相似序列丰度分布的信息,以及单个样本中序列的频率和多样性。我们进一步将这种方法细化为一个生物信息学管道,即扩增子焦磷酸测序去噪程序 (APDP),它能够将原始序列数据集处理成一组经过验证的序列,这些序列与常用的下游分析软件包兼容。通过对复杂的环境样本和模拟群落进行测序,我们证明 APDP 能够有效地从包含生物学和技术重复的深度测序数据集中去除错误,并且能够有效地对单一样本数据集进行去噪。APDP 为复杂数据集提供了比其他方法更保守的多样性估计值;然而,对于某些应用,这可能提供更准确和适当的分辨率水平,并使返回的序列更能反映基础样本的多样性。