Curto Manuel, Veríssimo Ana, Riccioni Giulia, Santos Carlos D, Ribeiro Filipe, Jentoft Sissel, Alves Maria Judite, Gante Hugo F
MARE - Marine and Environmental Sciences Center/ARNET - Aquatic Research Network, Faculty of Sciences, University of Lisbon, Lisbon, Portugal.
CIBIO - Research Center in Biodiversity and Genetic Resources, Vairão, Portugal.
Mol Ecol Resour. 2025 Aug;25(6):e14105. doi: 10.1111/1755-0998.14105. Epub 2025 Apr 1.
Environmental DNA (eDNA) metagenomics sequences all DNA molecules present in environmental samples and has the potential of identifying virtually any organism from which they are derived. However, due to unacceptable levels of false positives and negatives, this approach is underexplored as a tool for biodiversity monitoring across the tree of life, particularly for non-microscopic eukaryotes. We present SeqIDist, a framework that combines multilocus BLAST matches against several reference databases followed by an analysis of sequence identity distribution patterns to disentangle false positives while revealing new biodiversity and increasing the accuracy of metagenomic approaches. We tested SeqIDist on an eDNA metagenomic dataset from a riverine site and compared the results to those obtained with an eDNA metabarcoding approach for benchmarking purposes. We start by characterising the biological community (~2000 taxa) across the tree of life at low taxonomic levels and show that eDNA metagenomics has a higher sensitivity than eDNA metabarcoding in discovering new diversity. We show that limited representation of whole genome sequences in reference databases can lead to false positives. For non-microscopic eukaryotes, eDNA metagenomic data often consist of a few sparse, anonymous sequences scattered across the genome, making metagenome assembly methods unfeasible. Finally, we infer eDNA source and residency time using read length distributions as a measure of decay status. The higher accuracy of SeqIDist opens the discussion of the potential of eDNA metagenomics for archived samples and its implementation in long-term biodiversity monitoring at a planetary scale.
环境DNA(eDNA)宏基因组学对环境样本中存在的所有DNA分子进行测序,有潜力识别出几乎任何来源的生物。然而,由于假阳性和假阴性水平过高,这种方法作为一种跨生命之树进行生物多样性监测的工具尚未得到充分探索,特别是对于非微观真核生物。我们提出了SeqIDist框架,该框架将多位点BLAST匹配与多个参考数据库相结合,然后分析序列同一性分布模式,以消除假阳性,同时揭示新的生物多样性并提高宏基因组学方法的准确性。我们在一个河流站点的eDNA宏基因组数据集上测试了SeqIDist,并将结果与通过eDNA宏条形码方法获得的结果进行比较,以作基准。我们首先在低分类水平上对生命之树中的生物群落(约2000个分类单元)进行特征描述,并表明eDNA宏基因组学在发现新多样性方面比eDNA宏条形码具有更高的灵敏度。我们表明,参考数据库中全基因组序列的代表性有限会导致假阳性。对于非微观真核生物,eDNA宏基因组数据通常由散布在基因组中的一些稀疏、匿名序列组成,这使得宏基因组组装方法不可行。最后,我们使用读长分布作为衰变状态的度量来推断eDNA的来源和驻留时间。SeqIDist的更高准确性开启了关于eDNA宏基因组学在存档样本中的潜力及其在全球范围内长期生物多样性监测中的应用的讨论。