Enterome, 94-96 Avenue Ledru Rollin, Paris, France.
MGP MetaGénoPolis, INRA, Université Paris-Saclay, Jouy en Josas, France.
Bioinformatics. 2019 May 1;35(9):1544-1552. doi: 10.1093/bioinformatics/bty830.
MOTIVATION: Analysis toolkits for shotgun metagenomic data achieve strain-level characterization of complex microbial communities by capturing intra-species gene content variation. Yet, these tools are hampered by the extent of reference genomes that are far from covering all microbial variability, as many species are still not sequenced or have only few strains available. Binning co-abundant genes obtained from de novo assembly is a powerful reference-free technique to discover and reconstitute gene repertoire of microbial species. While current methods accurately identify species core parts, they miss many accessory genes or split them into small gene groups that remain unassociated to core clusters. RESULTS: We introduce MSPminer, a computationally efficient software tool that reconstitutes Metagenomic Species Pan-genomes (MSPs) by binning co-abundant genes across metagenomic samples. MSPminer relies on a new robust measure of proportionality coupled with an empirical classifier to group and distinguish not only species core genes but accessory genes also. Applied to a large scale metagenomic dataset, MSPminer successfully delineates in a few hours the gene repertoires of 1661 microbial species with similar specificity and higher sensitivity than existing tools. The taxonomic annotation of MSPs reveals microorganisms hitherto unknown and brings coherence in the nomenclature of the species of the human gut microbiota. The provided MSPs can be readily used for taxonomic profiling and biomarkers discovery in human gut metagenomic samples. In addition, MSPminer can be applied on gene count tables from other ecosystems to perform similar analyses. AVAILABILITY AND IMPLEMENTATION: The binary is freely available for non-commercial users at www.enterome.com/downloads. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
动机:通过捕获种内基因内容变异,用于 shotgun 宏基因组数据的分析工具包可实现复杂微生物群落的菌株水平特征描述。然而,这些工具受到参考基因组范围的限制,因为许多物种尚未测序或只有少数菌株可用,而这些参考基因组远远不能覆盖所有微生物的变异性。从从头组装中提取共丰度基因并对其进行分箱是一种强大的无参考技术,可用于发现和重建微生物物种的基因库。虽然当前的方法可以准确识别物种核心部分,但它们会错过许多辅助基因,或者将它们分裂成小的基因群,这些基因群仍然与核心簇无关。
结果:我们引入了 MSPminer,这是一种计算效率高的软件工具,可通过对多个宏基因组样本中的共丰度基因进行分箱来重建宏基因组物种泛基因组(MSP)。MSPminer 依赖于一种新的稳健比例度量方法,并结合经验分类器,不仅可以对物种核心基因进行分组和区分,还可以对辅助基因进行分组和区分。在大规模宏基因组数据集上的应用表明,MSPminer 可以在数小时内成功描绘出 1661 个微生物物种的基因库,其特异性和灵敏度均高于现有工具。MSP 的分类注释揭示了迄今未知的微生物,并使人类肠道微生物组中物种的命名法更加一致。提供的 MSP 可直接用于人类肠道宏基因组样本的分类分析和生物标志物发现。此外,MSPminer 可应用于来自其他生态系统的基因计数表,以执行类似的分析。
可用性和实现:二进制文件可在非商业用户在 www.enterome.com/downloads 上免费获得。
补充信息:补充数据可在 Bioinformatics 在线获得。
Bioinformatics. 2019-5-1
Bioinformatics. 2022-5-26
PLoS Comput Biol. 2013-10-17
Bioinformatics. 2020-6-1
BMC Bioinformatics. 2020-7-28
Nat Biotechnol. 2017-12-11
Oncoimmunology. 2025-12
Front Microbiol. 2024-10-30
Microbiome. 2024-6-28
Curr Issues Mol Biol. 2024-3-21
Nat Commun. 2017-10-10
Nat Microbiol. 2016-11-7
Nat Rev Microbiol. 2016-7-11
Front Microbiol. 2016-4-20