Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain.
CNRS, FR2424, ABiMS, Station Biologique de Roscoff, Sorbonne Université, Roscoff, France.
PLoS One. 2024 Jun 6;19(6):e0303697. doi: 10.1371/journal.pone.0303697. eCollection 2024.
Two common approaches to study the composition of environmental protist communities are metabarcoding and metagenomics. Raw metabarcoding data are usually processed into Operational Taxonomic Units (OTUs) or amplicon sequence variants (ASVs) through clustering or denoising approaches, respectively. Analogous approaches are used to assemble metagenomic reads into metagenome-assembled genomes (MAGs). Understanding the correspondence between the data produced by these two approaches can help to integrate information between the datasets and to explain how metabarcoding OTUs and MAGs are related with the underlying biological entities they are hypothesised to represent. MAGs do not contain the commonly used barcoding loci, therefore sequence homology approaches cannot be used to match OTUs and MAGs. We made an attempt to match V9 metabarcoding OTUs from the 18S rRNA gene (V9 OTUs) and MAGs from the Tara Oceans expedition based on the correspondence of their relative abundances across the same set of samples. We evaluated several metrics for detecting correspondence between features in these two datasets and developed controls to filter artefacts of data structure and processing. After selecting the best-performing metrics, ranking the V9 OTU/MAG matches by their proportionality/correlation coefficients and applying a set of selection criteria, we identified candidate matches between V9 OTUs and MAGs. In some cases, V9 OTUs and MAGs could be matched with a one-to-one correspondence, implying that they likely represent the same underlying biological entity. More generally, matches we observed could be classified into 4 scenarios: one V9 OTU matches many MAGs; many V9 OTUs match many MAGs; many V9 OTUs match one MAG; one V9 OTU matches one MAG. Notably, we found some instances in which different OTU-MAG matches from the same taxonomic group were not classified in the same scenario, with all four scenarios possible even within the same taxonomic group, illustrating that factors beyond taxonomic lineage influence the relationship between OTUs and MAGs. Overall, each scenario produces a different interpretation of V9 OTUs, MAGs and how they compare in terms of the genomic and ecological diversity they represent.
两种常用的方法来研究环境原生生物群落的组成是宏条形码和宏基因组学。原始宏条形码数据通常通过聚类或去噪方法分别处理成操作分类单元 (OTUs) 或扩增子序列变体 (ASVs)。类似的方法也用于将宏基因组读取组装成宏基因组组装基因组 (MAGs)。了解这两种方法产生的数据之间的对应关系有助于整合数据集之间的信息,并解释宏条形码 OTUs 和 MAGs 与它们所假设代表的潜在生物实体之间的关系。MAGs 不包含常用的条形码基因座,因此不能使用序列同源性方法来匹配 OTUs 和 MAGs。我们试图根据相同样本集上的相对丰度来匹配 Tara Oceans 考察的 18S rRNA 基因的 V9 宏条形码 OTUs (V9 OTUs) 和 MAGs。我们评估了几种用于检测这两个数据集特征之间对应关系的指标,并开发了控制措施来过滤数据结构和处理的人工制品。在选择性能最佳的指标后,根据比例/相关系数对 V9 OTU/MAG 匹配进行排名,并应用一系列选择标准,我们确定了 V9 OTUs 和 MAGs 之间的候选匹配。在某些情况下,V9 OTUs 和 MAGs 可以一一对应匹配,这意味着它们可能代表相同的潜在生物实体。更一般地,我们观察到的匹配可以分为 4 种情况:一个 V9 OTU 匹配许多 MAGs;许多 V9 OTUs 匹配许多 MAGs;许多 V9 OTUs 匹配一个 MAG;一个 V9 OTU 匹配一个 MAG。值得注意的是,我们发现有些情况下,来自同一分类群的不同 OTU-MAG 匹配没有被归类在同一情景中,即使在同一分类群中也可能出现所有四种情景,这表明除了分类群系之外,还有其他因素影响 OTUs 和 MAGs 之间的关系。总的来说,每种情景都对 V9 OTUs、MAGs 以及它们在基因组和生态多样性方面的比较产生了不同的解释。