去噪还是聚类，这不是问题所在：优化 COI metabarcoding 和元系统地理学的分析流程。

To denoise or to cluster, that is not the question: optimizing pipelines for COI metabarcoding and metaphylogeography.

机构信息

Department of Marine Ecology, Centre for Advanced Studies of Blanes (CEAB-CSIC), Blanes (Girona), Catalonia, Spain.

Department of Evolutionary Biology, Ecology and Environmental Sciences, University of Barcelona and Research Institute of Biodiversity (IRBIO), Barcelona, Catalonia, Spain.

出版信息

BMC Bioinformatics. 2021 Apr 5;22(1):177. doi: 10.1186/s12859-021-04115-6.

DOI:10.1186/s12859-021-04115-6

PMID:33820526

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8020537/

Abstract

BACKGROUND

The recent blooming of metabarcoding applications to biodiversity studies comes with some relevant methodological debates. One such issue concerns the treatment of reads by denoising or by clustering methods, which have been wrongly presented as alternatives. It has also been suggested that denoised sequence variants should replace clusters as the basic unit of metabarcoding analyses, missing the fact that sequence clusters are a proxy for species-level entities, the basic unit in biodiversity studies. We argue here that methods developed and tested for ribosomal markers have been uncritically applied to highly variable markers such as cytochrome oxidase I (COI) without conceptual or operational (e.g., parameter setting) adjustment. COI has a naturally high intraspecies variability that should be assessed and reported, as it is a source of highly valuable information. We contend that denoising and clustering are not alternatives. Rather, they are complementary and both should be used together in COI metabarcoding pipelines.

RESULTS

Using a COI dataset from benthic marine communities, we compared two denoising procedures (based on the UNOISE3 and the DADA2 algorithms), set suitable parameters for denoising and clustering, and applied these steps in different orders. Our results indicated that the UNOISE3 algorithm preserved a higher intra-cluster variability. We introduce the program DnoisE to implement the UNOISE3 algorithm taking into account the natural variability (measured as entropy) of each codon position in protein-coding genes. This correction increased the number of sequences retained by 88%. The order of the steps (denoising and clustering) had little influence on the final outcome.

CONCLUSIONS

We highlight the need for combining denoising and clustering, with adequate choice of stringency parameters, in COI metabarcoding. We present a program that uses the coding properties of this marker to improve the denoising step. We recommend researchers to report their results in terms of both denoised sequences (a proxy for haplotypes) and clusters formed (a proxy for species), and to avoid collapsing the sequences of the latter into a single representative. This will allow studies at the cluster (ideally equating species-level diversity) and at the intra-cluster level, and will ease additivity and comparability between studies.

摘要

背景

近年来，代谢条形码技术在生物多样性研究中的应用引发了一些相关的方法学争论。其中一个问题涉及到对reads 的处理，即使用去噪或聚类方法，这两种方法被错误地认为是相互替代的。还有人认为，去噪后的序列变体应该替代聚类作为代谢条形码分析的基本单位，而忽略了这样一个事实，即序列聚类是物种级实体的代理，是生物多样性研究的基本单位。我们在这里认为，为核糖体标记物开发和测试的方法未经批判性地应用于高度可变的标记物，如细胞色素氧化酶 I (COI)，而没有在概念或操作（例如参数设置）上进行调整。COI 具有自然的高种内变异性，应该进行评估和报告，因为它是非常有价值的信息来源。我们认为去噪和聚类不是替代关系，而是互补关系，在 COI 代谢条形码分析中应该一起使用。

结果

使用来自底栖海洋群落的 COI 数据集，我们比较了两种去噪程序（基于 UNOISE3 和 DADA2 算法），为去噪和聚类设置了合适的参数，并以不同的顺序应用这些步骤。我们的结果表明，UNOISE3 算法保留了更高的聚类内变异性。我们引入了程序 DnoisE，以考虑蛋白质编码基因中每个密码子位置的自然变异性（以熵来衡量）来实现 UNOISE3 算法。这种校正将保留的序列数量增加了 88%。步骤的顺序（去噪和聚类）对最终结果的影响很小。

结论

我们强调需要在 COI 代谢条形码中结合去噪和聚类，并适当选择严格性参数。我们提出了一个程序，该程序利用该标记物的编码特性来改进去噪步骤。我们建议研究人员报告去噪序列（代表单倍型）和形成的聚类（代表物种）的结果，并避免将后者的序列合并成一个单一的代表。这将允许在聚类（理想情况下等同于物种多样性）和聚类内水平上进行研究，并简化研究之间的可加性和可比性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dd7/8020537/9fcd5b4e7e79/12859_2021_4115_Fig1_HTML.jpg

相似文献

To denoise or to cluster, that is not the question: optimizing pipelines for COI metabarcoding and metaphylogeography.去噪还是聚类，这不是问题所在：优化 COI metabarcoding 和元系统地理学的分析流程。

BMC Bioinformatics. 2021 Apr 5;22(1):177. doi: 10.1186/s12859-021-04115-6.

DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets.DnoisE：基于熵的距离去噪。一种开源的、可并行化的序列数据集去噪方法。

PeerJ. 2022 Jan 19;10:e12758. doi: 10.7717/peerj.12758. eCollection 2022.

From metabarcoding to metaphylogeography: separating the wheat from the chaff.从代谢条形码到系统地理学：去芜存菁。

Ecol Appl. 2020 Mar;30(2):e02036. doi: 10.1002/eap.2036. Epub 2019 Dec 11.

Bioinformatic pipelines combining denoising and clustering tools allow for more comprehensive prokaryotic and eukaryotic metabarcoding.生物信息学管道结合去噪和聚类工具，可实现更全面的原核生物和真核生物代谢组学分析。

Mol Ecol Resour. 2021 Aug;21(6):1904-1921. doi: 10.1111/1755-0998.13398. Epub 2021 Apr 27.

DNA metabarcoding of littoral hard-bottom communities: high diversity and database gaps revealed by two molecular markers.滨海硬底群落的DNA宏条形码分析：两种分子标记揭示的高多样性和数据库缺口

PeerJ. 2018 May 4;6:e4705. doi: 10.7717/peerj.4705. eCollection 2018.

Comparing diversity levels in environmental samples: DNA sequence capture and metabarcoding approaches using 18S and COI genes.比较环境样本中的多样性水平：使用 18S 和 COI 基因的 DNA 序列捕获和代谢组学方法。

Mol Ecol Resour. 2020 Sep;20(5):1333-1345. doi: 10.1111/1755-0998.13201. Epub 2020 Jun 24.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Why the COI barcode should be the community DNA metabarcode for the metazoa.为什么 COI 条码应该成为后生动物的群落 DNA 元条形码。

Mol Ecol. 2018 Oct;27(20):3968-3975. doi: 10.1111/mec.14844. Epub 2018 Sep 17.

Optimal sequence similarity thresholds for clustering of molecular operational taxonomic units in DNA metabarcoding studies.DNA宏条形码研究中分子操作分类单元聚类的最佳序列相似性阈值

Mol Ecol Resour. 2023 Feb;23(2):368-381. doi: 10.1111/1755-0998.13709. Epub 2022 Sep 15.

Debar: A sequence-by-sequence denoiser for COI-5P DNA barcode data.德巴尔：一种用于 COI-5P DNA 条码数据的逐序列去噪器。

Mol Ecol Resour. 2021 Nov;21(8):2832-2846. doi: 10.1111/1755-0998.13384. Epub 2021 Apr 17.

引用本文的文献

What lurks in the dark? An innovative framework for studying diverse wild insect microbiota.黑暗中潜藏着什么？一个研究多样野生昆虫微生物群的创新框架。

Microbiome. 2025 Aug 12;13(1):186. doi: 10.1186/s40168-025-02169-9.

Cervicovaginal microbial features predict spread to the upper genital tract of infected women.宫颈阴道微生物特征可预测感染女性的病原体是否会扩散到上生殖道。

Infect Immun. 2025 Sep 9;93(9):e0005725. doi: 10.1128/iai.00057-25. Epub 2025 Aug 12.

Biomonitoring 2.0 Refined: observing local change through metaphylogeography using a community-based eDNA metabarcoding monitoring network.生物监测2.0优化版：利用基于群落的环境DNA宏条形码监测网络，通过元系统发育地理学观察局部变化。

BMC Biol. 2025 Jul 1;23(1):187. doi: 10.1186/s12915-025-02284-x.

Advancing molecular macrobenthos biodiversity monitoring: a comparison between Oxford Nanopore and Illumina based metabarcoding and metagenomics.推进分子大型底栖生物多样性监测：基于牛津纳米孔和Illumina的宏条形码和宏基因组学比较

PeerJ. 2025 Apr 14;13:e19158. doi: 10.7717/peerj.19158. eCollection 2025.

Metabarcoding identifies macroalgal composition as a driver of benthic invertebrate assemblages in restored habitats.宏条形码技术确定大型藻类组成是恢复栖息地中底栖无脊椎动物群落的驱动因素。

Sci Rep. 2025 Mar 21;15(1):9817. doi: 10.1038/s41598-025-93327-4.

Benthic Feeding and Diet Partitioning in Red Sea Mesopelagic Fish Resolved Through DNA Metabarcoding and ROV Footage.通过DNA宏条形码技术和遥控潜水器影像解析红海中层鱼类的底栖摄食与食性划分

Ecol Evol. 2025 Mar 6;15(3):e71091. doi: 10.1002/ece3.71091. eCollection 2025 Mar.

Maximizing Identification Precision of Hymenoptera and Brachycera (Diptera) With a Non-Destructive DNA Metabarcoding Approach.采用非破坏性DNA代谢条形码方法最大化膜翅目和短角亚目（双翅目）的鉴定精度

Ecol Evol. 2025 Jan 23;15(1):e70770. doi: 10.1002/ece3.70770. eCollection 2025 Jan.

Advances in multi-omics integrated analysis methods based on the gut microbiome and their applications.基于肠道微生物群的多组学综合分析方法进展及其应用

Front Microbiol. 2025 Jan 3;15:1509117. doi: 10.3389/fmicb.2024.1509117. eCollection 2024.

ITSxpress version 2: software to rapidly trim internal transcribed spacer sequences with quality scores for amplicon sequencing.ITSxpress版本2：用于通过扩增子测序的质量得分快速修剪内部转录间隔区序列的软件。

Microbiol Spectr. 2024 Nov 5;12(12):e0060124. doi: 10.1128/spectrum.00601-24.

Seasonality of primary production explains the richness of pioneering benthic communities.初级生产力的季节性解释了先锋底栖生物群落的丰富度。

Nat Commun. 2024 Sep 27;15(1):8340. doi: 10.1038/s41467-024-52673-z.

本文引用的文献

Trade-offs between reducing complex terminology and producing accurate interpretations from environmental DNA: Comment on "Environmental DNA: What's behind the term?" by Pawlowski et al., (2020).权衡减少复杂术语和从环境 DNA 中得出准确解释之间的关系：评 Pawlowski 等人的“环境 DNA：术语背后是什么？”（2020 年）。

Mol Ecol. 2021 Oct;30(19):4601-4605. doi: 10.1111/mec.15942. Epub 2021 May 25.

Profile hidden Markov model sequence analysis can help remove putative pseudogenes from DNA barcoding and metabarcoding datasets.基于轮廓的隐马尔可夫模型序列分析有助于从 DNA 条形码和代谢条形码数据集中去除可能的假基因。

BMC Bioinformatics. 2021 May 19;22(1):256. doi: 10.1186/s12859-021-04180-x.

Mol Ecol Resour. 2021 Aug;21(6):1904-1921. doi: 10.1111/1755-0998.13398. Epub 2021 Apr 27.

Testing small-scale ecological gradients and intraspecific differentiation for hundreds of kelp forest species using haplotypes from metabarcoding.利用宏条形码的单倍型对数百种大型海藻林物种进行小规模生态梯度和种内分化的检测。

Mol Ecol. 2021 Jul;30(13):3355-3373. doi: 10.1111/mec.15851. Epub 2021 Mar 8.

Validated removal of nuclear pseudogenes and sequencing artefacts from mitochondrial metabarcode data.从线粒体代谢组条形码数据中有效去除核假基因和测序伪像。

Mol Ecol Resour. 2021 Aug;21(6):1772-1787. doi: 10.1111/1755-0998.13337. Epub 2021 Feb 24.

The influence of intraspecific sequence variation during DNA metabarcoding: A case study of eleven fungal species.DNA metabarcoding 过程中的种内序列变异的影响：以十一种真菌物种为例。

Mol Ecol Resour. 2021 May;21(4):1141-1148. doi: 10.1111/1755-0998.13329. Epub 2021 Feb 18.

Pan-regional marine benthic cryptobiome biodiversity patterns revealed by metabarcoding Autonomous Reef Monitoring Structures.通过元条形码自主礁体监测结构揭示的泛区域海洋底栖隐生生物群落生物多样性模式

Mol Ecol. 2020 Dec;29(24):4882-4897. doi: 10.1111/mec.15692. Epub 2020 Nov 1.

A total crapshoot? Evaluating bioinformatic decisions in animal diet metabarcoding analyses.完全是碰运气？评估动物饮食代谢条形码分析中的生物信息学决策。

Ecol Evol. 2020 Jul 23;10(18):9721-9739. doi: 10.1002/ece3.6594. eCollection 2020 Sep.

Marine biomonitoring with eDNA: Can metabarcoding of water samples cut it as a tool for surveying benthic communities?海洋生物 DNA 监测：水样的 metabarcoding 能否成为调查底栖生物群落的工具？

Mol Ecol. 2021 Jul;30(13):3175-3188. doi: 10.1111/mec.15641. Epub 2020 Oct 8.

Biases in bulk: DNA metabarcoding of marine communities and the methodology involved.批量偏倚：海洋群落的 DNA 宏条形码技术及其相关方法。

Mol Ecol. 2021 Jul;30(13):3270-3288. doi: 10.1111/mec.15592. Epub 2020 Aug 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

去噪还是聚类，这不是问题所在：优化 COI metabarcoding 和元系统地理学的分析流程。

To denoise or to cluster, that is not the question: optimizing pipelines for COI metabarcoding and metaphylogeography.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献