DAMe:一个用于对带有双标签扩增子PCR重复序列的数据集进行初始处理的工具包,用于DNA宏条形码分析。

DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses.

作者信息

Zepeda-Mendoza Marie Lisandra, Bohmann Kristine, Carmona Baez Aldo, Gilbert M Thomas P

机构信息

Evogenomics, Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen, Denmark.

Undergraduate Program on Genomic Sciences, Center for Genomic Sciences, National Autonomous University of Mexico (UNAM), Av. Universidad s/n Col. Chamilpa, 62210, Cuernavaca, Morelos, Mexico.

出版信息

BMC Res Notes. 2016 May 3;9:255. doi: 10.1186/s13104-016-2064-9.

Abstract

BACKGROUND

DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and cost-efficient manner. One recent laboratory development is the addition of 5'-nucleotide tags to both primers producing double-tagged amplicons and the use of multiple PCR replicates to filter erroneous sequences. However, there is currently no available toolkit for the straightforward analysis of datasets produced in this way.

RESULTS

We present DAMe, a toolkit for the processing of datasets generated by double-tagged amplicons from multiple PCR replicates derived from an unlimited number of samples. Specifically, DAMe can be used to (i) sort amplicons by tag combination, (ii) evaluate PCR replicates dissimilarity, and (iii) filter sequences derived from sequencing/PCR errors, chimeras, and contamination. This is attained by calculating the following parameters: (i) sequence content similarity between the PCR replicates from each sample, (ii) reproducibility of each unique sequence across the PCR replicates, and (iii) copy number of the unique sequences in each PCR replicate. We showcase the insights that can be obtained using DAMe prior to taxonomic assignment, by applying it to two real datasets that vary in their complexity regarding number of samples, sequencing libraries, PCR replicates, and used tag combinations. Finally, we use a third mock dataset to demonstrate the impact and importance of filtering the sequences with DAMe.

CONCLUSIONS

DAMe allows the user-friendly manipulation of amplicons derived from multiple samples with PCR replicates built in a single or multiple sequencing libraries. It allows the user to: (i) collapse amplicons into unique sequences and sort them by tag combination while retaining the sample identifier and copy number information, (ii) identify sequences carrying unused tag combinations, (iii) evaluate the comparability of PCR replicates of the same sample, and (iv) filter tagged amplicons from a number of PCR replicates using parameters of minimum length, copy number, and reproducibility across the PCR replicates. This enables an efficient analysis of complex datasets, and ultimately increases the ease of handling datasets from large-scale studies.

摘要

背景

DNA 宏条形码技术是一种利用特定基因位点和分类群特异性引物识别环境样本中多个分类群的方法。与高通量测序相结合时,它能够以相对省时且经济高效的方式对大量样本进行分类表征。最近实验室的一项进展是在两种引物上都添加了 5'-核苷酸标签,从而产生双标签扩增子,并使用多个 PCR 复制品来过滤错误序列。然而,目前还没有用于直接分析以这种方式产生的数据集的可用工具包。

结果

我们展示了 DAMe,这是一个用于处理由来自无限数量样本的多个 PCR 复制品的双标签扩增子生成的数据集的工具包。具体而言,DAME 可用于:(i)按标签组合对扩增子进行分类,(ii)评估 PCR 复制品的差异,以及(iii)过滤来自测序/PCR 错误、嵌合体和污染的序列。这是通过计算以下参数来实现的:(i)每个样本的 PCR 复制品之间的序列内容相似性,(ii)每个独特序列在 PCR 复制品中的可重复性,以及(iii)每个 PCR 复制品中独特序列的拷贝数。我们通过将 DAMe 应用于两个真实数据集来展示在分类分配之前使用 DAMe 可以获得的见解,这两个数据集在样本数量、测序文库、PCR 复制品和使用的标签组合方面的复杂性各不相同。最后,我们使用第三个模拟数据集来证明用 DAMe 过滤序列的影响和重要性。

结论

DAME 允许用户方便地处理来自多个样本的扩增子,这些扩增子具有在单个或多个测序文库中构建的 PCR 复制品。它允许用户:(i)将扩增子合并为独特序列,并按标签组合进行分类,同时保留样本标识符和拷贝数信息,(ii)识别携带未使用标签组合的序列,(iii)评估同一样本的 PCR 复制品的可比性,以及(iv)使用最小长度、拷贝数和 PCR 复制品之间的可重复性等参数从多个 PCR 复制品中过滤带标签的扩增子。这使得能够对复杂数据集进行高效分析,并最终提高处理大规模研究数据集的便利性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35e3/4855357/73311c44fe5e/13104_2016_2064_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索