Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia.
Donovan Parks, Bioinformatic Consultant, Castlegar, British Columbia, Canada.
Nat Methods. 2023 Aug;20(8):1203-1212. doi: 10.1038/s41592-023-01940-w. Epub 2023 Jul 27.
Advances in sequencing technologies and bioinformatics tools have dramatically increased the recovery rate of microbial genomes from metagenomic data. Assessing the quality of metagenome-assembled genomes (MAGs) is a critical step before downstream analysis. Here, we present CheckM2, an improved method of predicting genome quality of MAGs using machine learning. Using synthetic and experimental data, we demonstrate that CheckM2 outperforms existing tools in both accuracy and computational speed. In addition, CheckM2's database can be rapidly updated with new high-quality reference genomes, including taxa represented only by a single genome. We also show that CheckM2 accurately predicts genome quality for MAGs from novel lineages, even for those with reduced genome size (for example, Patescibacteria and the DPANN superphylum). CheckM2 provides accurate genome quality predictions across bacterial and archaeal lineages, giving increased confidence when inferring biological conclusions from MAGs.
测序技术和生物信息学工具的进步极大地提高了从宏基因组数据中恢复微生物基因组的成功率。评估宏基因组组装基因组 (MAG) 的质量是下游分析之前的关键步骤。在这里,我们提出了 CheckM2,这是一种使用机器学习预测 MAG 基因组质量的改进方法。使用合成和实验数据,我们证明 CheckM2 在准确性和计算速度方面都优于现有工具。此外,CheckM2 的数据库可以快速更新新的高质量参考基因组,包括仅由单个基因组代表的分类群。我们还表明,CheckM2 可以准确预测新型谱系的 MAG 基因组质量,即使对于那些基因组较小的谱系(例如 Patescibacteria 和 DPANN 超门)也是如此。CheckM2 为细菌和古菌谱系提供了准确的基因组质量预测,从而在从 MAG 推断生物学结论时增加了信心。