DAMe：一个用于对带有双标签扩增子PCR重复序列的数据集进行初始处理的工具包，用于DNA宏条形码分析。

DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses.

作者信息

Zepeda-Mendoza Marie Lisandra, Bohmann Kristine, Carmona Baez Aldo, Gilbert M Thomas P

机构信息

Evogenomics, Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen, Denmark.

Undergraduate Program on Genomic Sciences, Center for Genomic Sciences, National Autonomous University of Mexico (UNAM), Av. Universidad s/n Col. Chamilpa, 62210, Cuernavaca, Morelos, Mexico.

出版信息

BMC Res Notes. 2016 May 3;9:255. doi: 10.1186/s13104-016-2064-9.

DOI:10.1186/s13104-016-2064-9

PMID:27142414

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4855357/

Abstract

BACKGROUND

DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and cost-efficient manner. One recent laboratory development is the addition of 5'-nucleotide tags to both primers producing double-tagged amplicons and the use of multiple PCR replicates to filter erroneous sequences. However, there is currently no available toolkit for the straightforward analysis of datasets produced in this way.

RESULTS

We present DAMe, a toolkit for the processing of datasets generated by double-tagged amplicons from multiple PCR replicates derived from an unlimited number of samples. Specifically, DAMe can be used to (i) sort amplicons by tag combination, (ii) evaluate PCR replicates dissimilarity, and (iii) filter sequences derived from sequencing/PCR errors, chimeras, and contamination. This is attained by calculating the following parameters: (i) sequence content similarity between the PCR replicates from each sample, (ii) reproducibility of each unique sequence across the PCR replicates, and (iii) copy number of the unique sequences in each PCR replicate. We showcase the insights that can be obtained using DAMe prior to taxonomic assignment, by applying it to two real datasets that vary in their complexity regarding number of samples, sequencing libraries, PCR replicates, and used tag combinations. Finally, we use a third mock dataset to demonstrate the impact and importance of filtering the sequences with DAMe.

CONCLUSIONS

DAMe allows the user-friendly manipulation of amplicons derived from multiple samples with PCR replicates built in a single or multiple sequencing libraries. It allows the user to: (i) collapse amplicons into unique sequences and sort them by tag combination while retaining the sample identifier and copy number information, (ii) identify sequences carrying unused tag combinations, (iii) evaluate the comparability of PCR replicates of the same sample, and (iv) filter tagged amplicons from a number of PCR replicates using parameters of minimum length, copy number, and reproducibility across the PCR replicates. This enables an efficient analysis of complex datasets, and ultimately increases the ease of handling datasets from large-scale studies.

摘要

背景

DNA 宏条形码技术是一种利用特定基因位点和分类群特异性引物识别环境样本中多个分类群的方法。与高通量测序相结合时，它能够以相对省时且经济高效的方式对大量样本进行分类表征。最近实验室的一项进展是在两种引物上都添加了 5'-核苷酸标签，从而产生双标签扩增子，并使用多个 PCR 复制品来过滤错误序列。然而，目前还没有用于直接分析以这种方式产生的数据集的可用工具包。

结果

我们展示了 DAMe，这是一个用于处理由来自无限数量样本的多个 PCR 复制品的双标签扩增子生成的数据集的工具包。具体而言，DAME 可用于：（i）按标签组合对扩增子进行分类，（ii）评估 PCR 复制品的差异，以及（iii）过滤来自测序/PCR 错误、嵌合体和污染的序列。这是通过计算以下参数来实现的：（i）每个样本的 PCR 复制品之间的序列内容相似性，（ii）每个独特序列在 PCR 复制品中的可重复性，以及（iii）每个 PCR 复制品中独特序列的拷贝数。我们通过将 DAMe 应用于两个真实数据集来展示在分类分配之前使用 DAMe 可以获得的见解，这两个数据集在样本数量、测序文库、PCR 复制品和使用的标签组合方面的复杂性各不相同。最后，我们使用第三个模拟数据集来证明用 DAMe 过滤序列的影响和重要性。

结论

DAME 允许用户方便地处理来自多个样本的扩增子，这些扩增子具有在单个或多个测序文库中构建的 PCR 复制品。它允许用户：（i）将扩增子合并为独特序列，并按标签组合进行分类，同时保留样本标识符和拷贝数信息，（ii）识别携带未使用标签组合的序列，（iii）评估同一样本的 PCR 复制品的可比性，以及（iv）使用最小长度、拷贝数和 PCR 复制品之间的可重复性等参数从多个 PCR 复制品中过滤带标签的扩增子。这使得能够对复杂数据集进行高效分析，并最终提高处理大规模研究数据集的便利性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35e3/4855357/73311c44fe5e/13104_2016_2064_Fig1_HTML.jpg

相似文献

DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses.

BMC Res Notes. 2016 May 3;9:255. doi: 10.1186/s13104-016-2064-9.

Tag jumps illuminated--reducing sequence-to-sample misidentifications in metabarcoding studies.

Mol Ecol Resour. 2015 Nov;15(6):1289-303. doi: 10.1111/1755-0998.12402. Epub 2015 Mar 20.

Tagsteady: A metabarcoding library preparation protocol to avoid false assignment of sequences to samples.

Mol Ecol Resour. 2020 Nov;20(6):1620-1631. doi: 10.1111/1755-0998.13227. Epub 2020 Aug 6.

The topological nature of tag jumping in environmental DNA metabarcoding studies.

Mol Ecol Resour. 2023 Apr;23(3):621-631. doi: 10.1111/1755-0998.13745. Epub 2023 Jan 6.

Assessing the influence of sample tagging and library preparation on DNA metabarcoding.

Mol Ecol Resour. 2019 Jul;19(4):893-899. doi: 10.1111/1755-0998.13018. Epub 2019 May 5.

Can DNA-Based Ecosystem Assessments Quantify Species Abundance? Testing Primer Bias and Biomass--Sequence Relationships with an Innovative Metabarcoding Protocol.

PLoS One. 2015 Jul 8;10(7):e0130324. doi: 10.1371/journal.pone.0130324. eCollection 2015.

An efficient and robust laboratory workflow and tetrapod database for larger scale environmental DNA studies.

Gigascience. 2019 Apr 1;8(4). doi: 10.1093/gigascience/giz029.

Strategies for sample labelling and library preparation in DNA metabarcoding studies.

Mol Ecol Resour. 2022 May;22(4):1231-1246. doi: 10.1111/1755-0998.13512. Epub 2021 Oct 13.

SLIM: a flexible web application for the reproducible processing of environmental DNA metabarcoding data.

BMC Bioinformatics. 2019 Feb 19;20(1):88. doi: 10.1186/s12859-019-2663-2.

Accurate multiplexing and filtering for high-throughput amplicon-sequencing.

Nucleic Acids Res. 2015 Mar 11;43(5):2513-24. doi: 10.1093/nar/gkv107. Epub 2015 Feb 17.

引用本文的文献

Experimental evaluation of genetic variability based on DNA metabarcoding from the aquatic environment: Insights from the Leray fragment.

Ecol Evol. 2024 Jul 4;14(7):e11631. doi: 10.1002/ece3.11631. eCollection 2024 Jul.

Persisting roadblocks in arthropod monitoring using non-destructive metabarcoding from collection media of passive traps.

PeerJ. 2023 Oct 10;11:e16022. doi: 10.7717/peerj.16022. eCollection 2023.

Extracting abundance information from DNA-based data.

Mol Ecol Resour. 2023 Jan;23(1):174-189. doi: 10.1111/1755-0998.13703. Epub 2022 Aug 30.

Measuring protected-area effectiveness using vertebrate distributions from leech iDNA.

Nat Commun. 2022 Mar 23;13(1):1555. doi: 10.1038/s41467-022-28778-8.

Climate-induced forest dieback drives compositional changes in insect communities that are more pronounced for rare species.

Commun Biol. 2022 Jan 18;5(1):57. doi: 10.1038/s42003-021-02968-4.

Strategies for sample labelling and library preparation in DNA metabarcoding studies.

Mol Ecol Resour. 2022 May;22(4):1231-1246. doi: 10.1111/1755-0998.13512. Epub 2021 Oct 13.

Dung beetles as samplers of mammals in Malaysian Borneo-a test of high throughput metabarcoding of iDNA.

PeerJ. 2021 Aug 13;9:e11897. doi: 10.7717/peerj.11897. eCollection 2021.

Salmon gut microbiota correlates with disease infection status: potential for monitoring health in farmed animals.

Anim Microbiome. 2021 Apr 20;3(1):30. doi: 10.1186/s42523-021-00096-2.

Leech blood-meal invertebrate-derived DNA reveals differences in Bornean mammal diversity across habitats.

Mol Ecol. 2021 Jul;30(13):3299-3312. doi: 10.1111/mec.15724. Epub 2020 Nov 27.

DNA metabarcoding and spatial modelling link diet diversification with distribution homogeneity in European bats.

Nat Commun. 2020 Mar 2;11(1):1154. doi: 10.1038/s41467-020-14961-2.

本文引用的文献

High-throughput sequencing and morphology perform equally well for benthic monitoring of marine ecosystems.

Sci Rep. 2015 Sep 10;5:13932. doi: 10.1038/srep13932.

Toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods.

Ecol Evol. 2015 Jun;5(11):2252-66. doi: 10.1002/ece3.1497. Epub 2015 May 13.

obitools: a unix-inspired software package for DNA metabarcoding.

Mol Ecol Resour. 2016 Jan;16(1):176-82. doi: 10.1111/1755-0998.12428. Epub 2015 May 26.

Tag jumps illuminated--reducing sequence-to-sample misidentifications in metabarcoding studies.

Mol Ecol Resour. 2015 Nov;15(6):1289-303. doi: 10.1111/1755-0998.12402. Epub 2015 Mar 20.

Accurate multiplexing and filtering for high-throughput amplicon-sequencing.

Nucleic Acids Res. 2015 Mar 11;43(5):2513-24. doi: 10.1093/nar/gkv107. Epub 2015 Feb 17.

DNA metabarcoding diet analysis for species with parapatric vs sympatric distribution: a case study on subterranean rodents.

Heredity (Edinb). 2015 May;114(5):525-36. doi: 10.1038/hdy.2014.109. Epub 2015 Feb 4.

Improved pipeline for reducing erroneous identification by 16S rRNA sequences using the Illumina MiSeq platform.

J Microbiol. 2015 Jan;53(1):60-9. doi: 10.1007/s12275-015-4601-y. Epub 2015 Jan 4.

Reagent and laboratory contamination can critically impact sequence-based microbiome analyses.

BMC Biol. 2014 Nov 12;12:87. doi: 10.1186/s12915-014-0087-z.

Performance comparison of Illumina and ion torrent next-generation sequencing platforms for 16S rRNA-based bacterial community profiling.

Appl Environ Microbiol. 2014 Dec;80(24):7583-91. doi: 10.1128/AEM.02206-14. Epub 2014 Sep 26.

Second generation sequencing and morphological faecal analysis reveal unexpected foraging behaviour by Myotis nattereri (Chiroptera, Vespertilionidae) in winter.

Front Zool. 2014 May 9;11:39. doi: 10.1186/1742-9994-11-39. eCollection 2014.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DAMe：一个用于对带有双标签扩增子PCR重复序列的数据集进行初始处理的工具包，用于DNA宏条形码分析。

DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses.

作者信息

Zepeda-Mendoza Marie Lisandra, Bohmann Kristine, Carmona Baez Aldo, Gilbert M Thomas P

机构信息

Evogenomics, Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350, Copenhagen, Denmark.

Undergraduate Program on Genomic Sciences, Center for Genomic Sciences, National Autonomous University of Mexico (UNAM), Av. Universidad s/n Col. Chamilpa, 62210, Cuernavaca, Morelos, Mexico.