Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA.
Bioinformatics. 2022 May 26;38(11):2973-2979. doi: 10.1093/bioinformatics/btac295.
Metagenomic binning aims to retrieve microbial genomes directly from ecosystems by clustering metagenomic contigs assembled from short reads into draft genomic bins. Traditional shotgun-based binning methods depend on the contigs' composition and abundance profiles and are impaired by the paucity of enough samples to construct reliable co-abundance profiles. When applied to a single sample, shotgun-based binning methods struggle to distinguish closely related species only using composition information. As an alternative binning approach, Hi-C-based binning employs metagenomic Hi-C technique to measure the proximity contacts between metagenomic fragments. However, spurious inter-species Hi-C contacts inevitably generated by incorrect ligations of DNA fragments between species link the contigs from varying genomes, weakening the purity of final draft genomic bins. Therefore, it is imperative to develop a binning pipeline to overcome the shortcomings of both types of binning methods on a single sample.
We develop HiFine, a novel binning pipeline to refine the binning results of metagenomic contigs by integrating both Hi-C-based and shotgun-based binning tools. HiFine designs a strategy of fragmentation for the original bin sets derived from the Hi-C-based and shotgun-based binning methods, which considerably increases the purity of initial bins, followed by merging fragmented bins and recruiting unbinned contigs. We demonstrate that HiFine significantly improves the existing binning results of both types of binning methods and achieves better performance in constructing species genomes on publicly available datasets. To the best of our knowledge, HiFine is the first pipeline to integrate different types of tools for the binning of metagenomic contigs.
HiFine is available at https://github.com/dyxstat/HiFine.
Supplementary data are available at Bioinformatics online.
宏基因组 bin 分法旨在通过将组装自短读长的宏基因组 contigs 聚类为基因组草案 bin 来直接从生态系统中检索微生物基因组。传统的基于 shotgun 的 bin 分法依赖于 contigs 的组成和丰度分布,并且由于缺乏足够的样本构建可靠的共丰度分布而受到影响。当应用于单个样本时,基于 shotgun 的 bin 分法仅使用组成信息难以区分密切相关的物种。作为替代 bin 分法,Hi-C 基于 bin 分法使用宏基因组 Hi-C 技术来测量宏基因组片段之间的接近接触。然而,由于物种间 DNA 片段的不正确连接不可避免地产生了虚假的种间 Hi-C 接触,从而将来自不同基因组的 contigs 联系起来,削弱了最终草案基因组 bin 的纯度。因此,有必要开发一个 bin 分法流程,以克服单个样本中这两种 bin 分法的缺点。
我们开发了 HiFine,这是一种新颖的 bin 分法流程,通过整合基于 Hi-C 和 shotgun 的 bin 分法工具来改进宏基因组 contigs 的 bin 分结果。HiFine 为基于 Hi-C 和 shotgun 的 bin 分法获得的原始 bin 集设计了一种碎片化策略,这极大地提高了初始 bin 的纯度,然后合并碎片化 bin 并招募未 bin 的 contigs。我们证明,HiFine 显著改善了这两种 bin 分法的现有 bin 分结果,并在构建公共可用数据集上的物种基因组方面取得了更好的性能。据我们所知,HiFine 是第一个集成不同类型工具用于宏基因组 contigs bin 分的流程。
HiFine 可在 https://github.com/dyxstat/HiFine 上获得。
补充数据可在生物信息学在线获得。