Suppr超能文献

去噪还是聚类,这不是问题所在:优化 COI metabarcoding 和元系统地理学的分析流程。

To denoise or to cluster, that is not the question: optimizing pipelines for COI metabarcoding and metaphylogeography.

机构信息

Department of Marine Ecology, Centre for Advanced Studies of Blanes (CEAB-CSIC), Blanes (Girona), Catalonia, Spain.

Department of Evolutionary Biology, Ecology and Environmental Sciences, University of Barcelona and Research Institute of Biodiversity (IRBIO), Barcelona, Catalonia, Spain.

出版信息

BMC Bioinformatics. 2021 Apr 5;22(1):177. doi: 10.1186/s12859-021-04115-6.

Abstract

BACKGROUND

The recent blooming of metabarcoding applications to biodiversity studies comes with some relevant methodological debates. One such issue concerns the treatment of reads by denoising or by clustering methods, which have been wrongly presented as alternatives. It has also been suggested that denoised sequence variants should replace clusters as the basic unit of metabarcoding analyses, missing the fact that sequence clusters are a proxy for species-level entities, the basic unit in biodiversity studies. We argue here that methods developed and tested for ribosomal markers have been uncritically applied to highly variable markers such as cytochrome oxidase I (COI) without conceptual or operational (e.g., parameter setting) adjustment. COI has a naturally high intraspecies variability that should be assessed and reported, as it is a source of highly valuable information. We contend that denoising and clustering are not alternatives. Rather, they are complementary and both should be used together in COI metabarcoding pipelines.

RESULTS

Using a COI dataset from benthic marine communities, we compared two denoising procedures (based on the UNOISE3 and the DADA2 algorithms), set suitable parameters for denoising and clustering, and applied these steps in different orders. Our results indicated that the UNOISE3 algorithm preserved a higher intra-cluster variability. We introduce the program DnoisE to implement the UNOISE3 algorithm taking into account the natural variability (measured as entropy) of each codon position in protein-coding genes.  This correction increased the number of sequences retained by 88%. The order of the steps (denoising and clustering) had little influence on the final outcome.

CONCLUSIONS

We highlight the need for combining denoising and clustering, with adequate choice of stringency parameters, in COI metabarcoding. We present a program that uses the coding properties of this marker to improve the denoising step. We recommend researchers to report their results in terms of both denoised sequences (a proxy for haplotypes) and clusters formed (a proxy for species), and to avoid collapsing the sequences of the latter into a single representative. This will allow studies at the cluster (ideally equating species-level diversity) and at the intra-cluster level, and will ease additivity and comparability between studies.

摘要

背景

近年来,代谢条形码技术在生物多样性研究中的应用引发了一些相关的方法学争论。其中一个问题涉及到对reads 的处理,即使用去噪或聚类方法,这两种方法被错误地认为是相互替代的。还有人认为,去噪后的序列变体应该替代聚类作为代谢条形码分析的基本单位,而忽略了这样一个事实,即序列聚类是物种级实体的代理,是生物多样性研究的基本单位。我们在这里认为,为核糖体标记物开发和测试的方法未经批判性地应用于高度可变的标记物,如细胞色素氧化酶 I (COI),而没有在概念或操作(例如参数设置)上进行调整。COI 具有自然的高种内变异性,应该进行评估和报告,因为它是非常有价值的信息来源。我们认为去噪和聚类不是替代关系,而是互补关系,在 COI 代谢条形码分析中应该一起使用。

结果

使用来自底栖海洋群落的 COI 数据集,我们比较了两种去噪程序(基于 UNOISE3 和 DADA2 算法),为去噪和聚类设置了合适的参数,并以不同的顺序应用这些步骤。我们的结果表明,UNOISE3 算法保留了更高的聚类内变异性。我们引入了程序 DnoisE,以考虑蛋白质编码基因中每个密码子位置的自然变异性(以熵来衡量)来实现 UNOISE3 算法。这种校正将保留的序列数量增加了 88%。步骤的顺序(去噪和聚类)对最终结果的影响很小。

结论

我们强调需要在 COI 代谢条形码中结合去噪和聚类,并适当选择严格性参数。我们提出了一个程序,该程序利用该标记物的编码特性来改进去噪步骤。我们建议研究人员报告去噪序列(代表单倍型)和形成的聚类(代表物种)的结果,并避免将后者的序列合并成一个单一的代表。这将允许在聚类(理想情况下等同于物种多样性)和聚类内水平上进行研究,并简化研究之间的可加性和可比性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0dd7/8020537/9fcd5b4e7e79/12859_2021_4115_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验