Wilding Thomas A, Stoeck Thorsten, Morrissey Barbara J, Carvalho Silvia Ferreira, Coulson Mark W
Scottish Association for Marine Science, Dunbeg, OBAN, PA34 1QA, UK.
Technische Universität Kaiserslautern, Dept. of Ecology, D-67663 Kaiserslautern, Germany.
Sci Total Environ. 2023 Feb 1;858(Pt 1):159735. doi: 10.1016/j.scitotenv.2022.159735. Epub 2022 Oct 28.
Man's impacts on global ecosystems are increasing and there is a growing demand that these activities be appropriately monitored. Monitoring requires measurement of a response metric ('signal') that changes maximally and consistently in response to the monitored activity irrespective of other factors ('noise'), thus maximising the signal-to-noise ratio. Indices derived from time-consuming morphology-based taxonomic identification of organisms are a core part of many monitoring programmes. Metabarcoding is an alternative to morphology-based identification and involves the sequencing of short fragments of DNA ('markers') from multiple taxa simultaneously. DNA suitable for metabarcoding includes that extracted from environmental samples (eDNA). Metabarcoding outputs DNA sequences that can be identified (annotated) by matching them against archived annotated sequences. However, sequences from most organisms are not archived - preventing annotation and potentially limiting metabarcoding in monitoring applications. Consequently, there is growing interest in using unannotated sequences as response metrics in monitoring programmes. We compared the sequences from three commonly used markers (16S (V3/V4 regions), 18S (V1/V2 regions) and COI) and, sampling along steep impact gradients, showed that the 16S and COI sequences were associated with the largest and smallest signal-to-noise ratio respectively. We trialled four separate, intuitive, noise-reduction approaches and demonstrated that removing less frequent sequences improved the signal-to-noise ratio, partitioning an additional 25 % from noise to explanatory factors in non-parametric ANOVA (NPA) and reducing dispersion in the data. For the 16S marker, retaining only the most frequently observed sequence, per sample, resulting in nine sequences across 150 samples, generated a near-maximal signal-to-noise ratio (95 % of the variance explained in NPA). We recommend that NPA, combined with rigorous elimination of less frequent sequences, be used to pre-filter sequences/taxa being used in monitoring applications. Our approach will simplify downstream analysis, for example the identification of key taxa and functional associations.
人类对全球生态系统的影响在不断增加,对这些活动进行适当监测的需求也日益增长。监测需要测量一个响应指标(“信号”),该指标能最大程度且一致地响应被监测活动而变化,不受其他因素(“噪声”)影响,从而使信噪比最大化。基于耗时的生物形态分类鉴定得出的指数是许多监测计划的核心部分。代谢条形码技术是基于形态学鉴定的替代方法,它涉及同时对多个分类群的短DNA片段(“标记”)进行测序。适用于代谢条形码技术的DNA包括从环境样本(eDNA)中提取的DNA。代谢条形码技术输出的DNA序列可通过与存档的注释序列匹配来进行鉴定(注释)。然而,大多数生物的序列并未存档,这阻碍了注释过程,并可能限制代谢条形码技术在监测应用中的使用。因此,在监测计划中使用未注释序列作为响应指标的兴趣日益浓厚。我们比较了三种常用标记(16S(V3/V4区域)、18S(V1/V2区域)和COI)的序列,并沿陡峭的影响梯度进行采样,结果表明16S和COI序列分别与最大和最小的信噪比相关。我们试验了四种独立、直观的降噪方法,结果表明去除频率较低的序列可提高信噪比,在非参数方差分析(NPA)中将另外25%的噪声分配到解释因素中,并减少数据中的离散度。对于16S标记,每个样本仅保留最常观察到的序列,150个样本共产生9个序列,产生了接近最大的信噪比(NPA中95%的方差得到解释)。我们建议将NPA与严格消除频率较低的序列相结合,用于对监测应用中使用的序列/分类群进行预过滤。我们的方法将简化下游分析,例如关键分类群和功能关联的识别。