Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT) 76344, Eggenstein-Leopoldshafen, Germany.
Nucleic Acids Res. 2022 Jul 22;50(13):e76. doi: 10.1093/nar/gkac294.
As of today, the majority of environmental microorganisms remain uncultured and is therefore referred to as 'microbial dark matter' (MDM). Hence, genomic insights into these organisms are limited to cultivation-independent approaches such as single-cell- and metagenomics. However, without access to cultured representatives for verifying correct taxon-assignments, MDM genomes may cause potentially misleading conclusions based on misclassified or contaminant contigs, thereby obfuscating our view on the uncultured microbial majority. Moreover, gradual database contaminations by past genome submissions can cause error propagations which affect present as well as future comparative genome analyses. Consequently, strict contamination detection and filtering need to be applied, especially in the case of uncultured MDM genomes. Current genome reporting standards, however, emphasize completeness over purity and the de facto gold standard genome assessment tool, checkM, discriminates against uncultured taxa and fragmented genomes. To tackle these issues, we present a novel contig classification, screening, and filtering workflow and corresponding open-source python implementation called MDMcleaner, which was tested and compared to other tools on mock and real datasets. MDMcleaner revealed substantial contaminations overlooked by current screening approaches and sensitively detects misattributed contigs in both novel genomes and the underlying reference databases, thereby greatly improving our view on 'microbial dark matter'.
截至今天,大多数环境微生物仍未被培养,因此被称为“微生物暗物质”(MDM)。因此,对这些生物体的基因组的了解仅限于培养独立性方法,例如单细胞和宏基因组学。然而,如果无法获得用于验证正确分类群分配的培养代表,MDM 基因组可能会基于错误分类或污染的 contigs 导致潜在误导性的结论,从而模糊我们对未培养的微生物主要群体的看法。此外,过去基因组提交导致的数据库逐渐污染会引起影响当前和未来比较基因组分析的错误传播。因此,特别是在未培养的 MDM 基因组的情况下,需要应用严格的污染检测和过滤。然而,现行的基因组报告标准强调完整性而非纯度,事实上的黄金标准基因组评估工具 checkM 歧视未培养的分类群和碎片化的基因组。为了解决这些问题,我们提出了一种新的 contig 分类、筛选和过滤工作流程以及相应的开源 python 实现,称为 MDMcleaner,我们对其进行了测试,并在模拟和真实数据集上与其他工具进行了比较。MDMcleaner 揭示了当前筛选方法忽略的大量污染,并在新基因组和基础参考数据库中敏感地检测到错误归因的 contigs,从而大大提高了我们对“微生物暗物质”的认识。