State Key Laboratory of Genetic Resources and Evolution and Yunnan Key Laboratory of Biodiversity and Ecological Security of Gaoligong Mountain, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
Kunming College of Life Sciences, University of Chinese Academy of Sciences, Kunming, Yunnan, China.
Mol Ecol Resour. 2023 Jan;23(1):174-189. doi: 10.1111/1755-0998.13703. Epub 2022 Aug 30.
The accurate extraction of species-abundance information from DNA-based data (metabarcoding, metagenomics) could contribute usefully to diet analysis and food-web reconstruction, the inference of species interactions, the modelling of population dynamics and species distributions, the biomonitoring of environmental state and change, and the inference of false positives and negatives. However, multiple sources of bias and noise in sampling and processing combine to inject error into DNA-based data sets. To understand how to extract abundance information, it is useful to distinguish two concepts. (i) Within-sample across-species quantification describes relative species abundances in one sample. (ii) Across-sample within-species quantification describes how the abundance of each individual species varies from sample to sample, such as over a time series, an environmental gradient or different experimental treatments. First, we review the literature on methods to recover across-species abundance information (by removing what we call "species pipeline biases") and within-species abundance information (by removing what we call "pipeline noise"). We argue that many ecological questions can be answered with just within-species quantification, and we therefore demonstrate how to use a "DNA spike-in" to correct for pipeline noise and recover within-species abundance information. We also introduce a model-based estimator that can be used on data sets without a physical spike-in to approximate and correct for pipeline noise.
从基于 DNA 的数据(代谢组学、宏基因组学)中准确提取物种丰度信息,可以为饮食分析和食物网重建、物种相互作用推断、种群动态和物种分布建模、环境状态和变化的生物监测以及假阳性和假阴性推断提供有用的帮助。然而,采样和处理过程中的多种偏倚和噪声源会在 DNA 数据集注入误差。为了理解如何提取丰度信息,区分两个概念是有用的。(i) 样本内跨物种定量描述了一个样本中相对物种丰度。(ii) 样本间单物种定量描述了每个单物种的丰度如何从一个样本到另一个样本变化,例如在时间序列、环境梯度或不同的实验处理中。首先,我们回顾了关于恢复跨物种丰度信息的方法的文献(通过去除我们称之为“物种管道偏差”的信息)和单物种丰度信息(通过去除我们称之为“管道噪声”的信息)。我们认为,许多生态问题可以仅通过单物种定量来回答,因此我们展示了如何使用“DNA 加标”来纠正管道噪声并恢复单物种丰度信息。我们还引入了一种基于模型的估计器,可用于没有物理加标的数据集,以近似和纠正管道噪声。