Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA.
J Comput Biol. 2024 Oct;31(10):1008-1021. doi: 10.1089/cmb.2024.0663. Epub 2024 Sep 9.
Metagenomic Hi-C (metaHi-C) has shown remarkable potential for retrieving high-quality metagenome-assembled genomes from complex microbial communities. Nevertheless, existing metaHi-C-based contig binning methods solely rely on Hi-C interactions between contigs, disregarding crucial biological information such as the presence of single-copy marker genes. To overcome this limitation, we introduce ImputeCC, an integrative contig binning tool optimized for metaHi-C datasets. ImputeCC integrates both Hi-C interactions and the discriminative power of single-copy marker genes to group marker-gene-containing contigs into preliminary bins. It also introduces a novel constrained random walk with restart algorithm to enhance Hi-C connectivity among contigs. Comprehensive assessments using both mock and real metaHi-C datasets from diverse environments demonstrate that ImputeCC consistently outperforms other Hi-C-based contig binning tools. A genus-level analysis of the sheep gut microbiota reconstructed by ImputeCC underlines its capability to recover key species from dominant genera and identify previously unknown genera.
宏基因组 Hi-C(metaHi-C)在从复杂微生物群落中获取高质量宏基因组组装基因组方面显示出巨大的潜力。然而,现有的基于 metaHi-C 的 contig 分箱方法仅依赖于 contig 之间的 Hi-C 相互作用,而忽略了关键的生物学信息,如单拷贝标记基因的存在。为了克服这一限制,我们引入了 ImputeCC,这是一种针对 metaHi-C 数据集优化的集成 contig 分箱工具。ImputeCC 整合了 Hi-C 相互作用和单拷贝标记基因的判别能力,将含有标记基因的 contig 分组到初步的 bin 中。它还引入了一种新颖的受限随机游走带重启算法,以增强 contig 之间的 Hi-C 连通性。使用来自不同环境的模拟和真实 metaHi-C 数据集进行的综合评估表明,ImputeCC 始终优于其他基于 Hi-C 的 contig 分箱工具。通过 ImputeCC 重建的绵羊肠道微生物组的属级分析强调了它从优势属中恢复关键物种和识别以前未知属的能力。