Institut François Jacob, CEA, CNRS, Génomique Métabolique - UMR 8030, Univ Evry, Université Paris-Saclay, Evry, France.
IBGBI-LaMME, Univ Evry, Université Paris-Saclay, Evry, France.
PLoS One. 2020 Dec 30;15(12):e0244637. doi: 10.1371/journal.pone.0244637. eCollection 2020.
The availability of large metagenomic data offers great opportunities for the population genomic analysis of uncultured organisms, which represent a large part of the unexplored biosphere and play a key ecological role. However, the majority of these organisms lack a reference genome or transcriptome, which constitutes a technical obstacle for classical population genomic analyses. We introduce the metavariant species (MVS) model, in which a species is represented only by intra-species nucleotide polymorphism. We designed a method combining reference-free variant calling, multiple density-based clustering and maximum-weighted independent set algorithms to cluster intra-species variants into MVSs directly from multisample metagenomic raw reads without a reference genome or read assembly. The frequencies of the MVS variants are then used to compute population genomic statistics such as FST, in order to estimate genomic differentiation between populations and to identify loci under natural selection. The MVS construction was tested on simulated and real metagenomic data. MVSs showed the required quality for robust population genomics and allowed an accurate estimation of genomic differentiation (ΔFST < 0.0001 and <0.03 on simulated and real data respectively). Loci predicted under natural selection on real data were all detected by MVSs. MVSs represent a new paradigm that may simplify and enhance holistic approaches for population genomics and the evolution of microorganisms.
大量宏基因组数据的出现为未培养生物体的群体基因组分析提供了极好的机会,这些生物体构成了未探索生物圈的大部分,并发挥着关键的生态作用。然而,这些生物体中的大多数缺乏参考基因组或转录组,这对经典的群体基因组分析构成了技术障碍。我们引入了变体物种(MVS)模型,其中一个物种仅由种内核苷酸多态性来代表。我们设计了一种方法,将无参考的变异调用、基于密度的多重聚类以及最大权独立集算法相结合,直接从多样本宏基因组原始读段中聚类种内变异,而无需参考基因组或读段组装。然后,使用 MVS 变体的频率来计算群体基因组统计数据,如 FST,以便估计种群之间的基因组分化,并识别自然选择下的基因座。我们在模拟和真实的宏基因组数据上测试了 MVS 的构建。MVS 表现出了稳健的群体基因组学所需的质量,并能够准确估计基因组分化(在模拟和真实数据上分别为 ΔFST < 0.0001 和 <0.03)。在真实数据上预测到的受自然选择影响的基因座都被 MVS 检测到了。MVS 代表了一种新的范例,可能会简化和增强微生物群体基因组学和进化的整体方法。