Island Ecology and Evolution Research Group, Institute of Natural Products and Agrobiology (IPNA-CSIC, San Cristóbal de la Laguna, Spain.
Department of Life Sciences, Natural History Museum, London, UK.
Mol Ecol Resour. 2021 Aug;21(6):1772-1787. doi: 10.1111/1755-0998.13337. Epub 2021 Feb 24.
Metabarcoding of Metazoa using mitochondrial genes may be confounded by both the accumulation of PCR and sequencing artefacts and the co-amplification of nuclear mitochondrial pseudogenes (NUMTs). The application of read abundance thresholds and denoising methods is efficient in reducing noise accompanying authentic mitochondrial amplicon sequence variants (ASVs). However, these procedures do not fully account for the complex nature of concomitant sequences and the highly variable DNA contribution of specimens in a metabarcoding sample. We propose, as a complement to denoising, the metabarcoding Multidimensional Abundance Threshold Evaluation (metaMATE) framework, a novel approach that allows comprehensive examination of multiple dimensions of abundance filtering and the evaluation of the prevalence of unwanted concomitant sequences in denoised metabarcoding datasets. metaMATE requires a denoised set of ASVs as input, and designates a subset of ASVs as being either authentic (mitochondrial DNA haplotypes) or nonauthentic ASVs (NUMTs and erroneous sequences) by comparison to external reference data and by analysing nucleotide substitution patterns. metaMATE (i) facilitates the application of read abundance filtering strategies, which are structured with regard to sequence library and phylogeny and applied for a range of increasing abundance threshold values, and (ii) evaluates their performance by quantifying the prevalence of nonauthentic ASVs and the collateral effects on the removal of authentic ASVs. The output from metaMATE facilitates decision-making about required filtering stringency and can be used to improve the reliability of intraspecific genetic information derived from metabarcode data. The framework is implemented in the metaMATE software (available at https://github.com/tjcreedy/metamate).
使用线粒体基因对后生动物进行代谢条形码分析可能会受到 PCR 和测序伪像积累以及核线粒体假基因(NUMTs)共同扩增的影响。使用读取丰度阈值和去噪方法可以有效地减少伴随真实线粒体扩增子序列变异体(ASV)的噪声。然而,这些程序并不能完全解释伴随序列的复杂性质,也不能充分考虑到代谢条形码样本中标本的高度可变 DNA 贡献。我们建议作为去噪的补充,采用代谢条形码多维丰度阈值评估(metaMATE)框架,这是一种新方法,可以全面检查多种丰度过滤维度,并评估去噪代谢条形码数据集中不需要伴随序列的流行程度。metaMATE 需要一组去噪的 ASV 作为输入,并通过与外部参考数据进行比较以及分析核苷酸替代模式,将一部分 ASV 指定为真实的(线粒体 DNA 单倍型)或非真实的 ASV(NUMTs 和错误序列)。metaMATE (i)有助于应用基于序列库和系统发育的读取丰度过滤策略,并应用于一系列不断增加的丰度阈值值;(ii)通过量化非真实 ASV 的流行程度以及对真实 ASV 去除的附带影响来评估其性能。metaMATE 的输出有助于确定所需的过滤严格程度的决策,并可用于提高从代谢条形码数据中得出的种内遗传信息的可靠性。该框架在 metaMATE 软件中实现(可在 https://github.com/tjcreedy/metamate 上获得)。