Borderes Marianne, Gasc Cyrielle, Prestat Emmanuel, Galvão Ferrarini Mariana, Vinga Susana, Boucinha Lilia, Sagot Marie-France
MaaT Pharma, 317 Avenue Jean Jaurès, 69007 Lyon, France.
Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France.
NAR Genom Bioinform. 2021 Mar 1;3(1):lqab009. doi: 10.1093/nargab/lqab009. eCollection 2021 Mar.
The human gut microbiota performs functions that are essential for the maintenance of the host physiology. However, characterizing the functioning of microbial communities in relation to the host remains challenging in reference-based metagenomic analyses. Indeed, as taxonomic and functional analyses are performed independently, the link between genes and species remains unclear. Although a first set of species-level bins was built by clustering co-abundant genes, no reference bin set is established on the most used gut microbiota catalog, the Integrated Gene Catalog (IGC). With the aim to identify the best suitable method to group the IGC genes, we benchmarked nine taxonomy-independent binners implementing abundance-based, hybrid and integrative approaches. To this purpose, we designed a simulated non-redundant gene catalog (SGC) and computed adapted assessment metrics. Overall, the best trade-off between the main metrics is reached by an integrative binner. For each approach, we then compared the results of the best-performing binner with our expected community structures and applied the method to the IGC. The three approaches are distinguished by specific advantages, and by inherent or scalability limitations. Hybrid and integrative binners show promising and potentially complementary results but require improvements to be used on the IGC to recover human gut microbial species.
人类肠道微生物群执行着对维持宿主生理功能至关重要的功能。然而,在基于参考的宏基因组分析中,表征微生物群落与宿主相关的功能仍然具有挑战性。事实上,由于分类学和功能分析是独立进行的,基因与物种之间的联系仍不明确。尽管通过对共丰度基因进行聚类构建了第一组物种水平的分箱,但在最常用的肠道微生物群目录即综合基因目录(IGC)上,尚未建立参考分箱集。为了确定对IGC基因进行分组的最合适方法,我们对九种采用基于丰度、混合和整合方法的非依赖分类学的分箱工具进行了基准测试。为此,我们设计了一个模拟的非冗余基因目录(SGC)并计算了适用的评估指标。总体而言,整合分箱工具在主要指标之间达到了最佳平衡。然后,对于每种方法,我们将表现最佳的分箱工具的结果与预期的群落结构进行比较,并将该方法应用于IGC。这三种方法具有各自的特定优势以及内在或可扩展性限制。混合分箱工具和整合分箱工具显示出有前景且可能互补的结果,但需要改进才能用于IGC以恢复人类肠道微生物物种。