Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.
Nat Commun. 2023 Jan 31;14(1):502. doi: 10.1038/s41467-023-35945-y.
The introduction of high-throughput chromosome conformation capture (Hi-C) into metagenomics enables reconstructing high-quality metagenome-assembled genomes (MAGs) from microbial communities. Despite recent advances in recovering eukaryotic, bacterial, and archaeal genomes using Hi-C contact maps, few of Hi-C-based methods are designed to retrieve viral genomes. Here we introduce ViralCC, a publicly available tool to recover complete viral genomes and detect virus-host pairs using Hi-C data. Compared to other Hi-C-based methods, ViralCC leverages the virus-host proximity structure as a complementary information source for the Hi-C interactions. Using mock and real metagenomic Hi-C datasets from several different microbial ecosystems, including the human gut, cow fecal, and wastewater, we demonstrate that ViralCC outperforms existing Hi-C-based binning methods as well as state-of-the-art tools specifically dedicated to metagenomic viral binning. ViralCC can also reveal the taxonomic structure of viruses and virus-host pairs in microbial communities. When applied to a real wastewater metagenomic Hi-C dataset, ViralCC constructs a phage-host network, which is further validated using CRISPR spacer analyses. ViralCC is an open-source pipeline available at https://github.com/dyxstat/ViralCC .
高通量染色体构象捕获(Hi-C)技术在宏基因组学中的引入,使得我们能够从微生物群落中重建高质量的宏基因组组装基因组(MAGs)。尽管最近在使用 Hi-C 接触图谱恢复真核生物、细菌和古菌基因组方面取得了进展,但很少有基于 Hi-C 的方法被设计用于检索病毒基因组。在这里,我们介绍了 ViralCC,这是一个可公开获取的工具,用于使用 Hi-C 数据恢复完整的病毒基因组并检测病毒-宿主对。与其他基于 Hi-C 的方法相比,ViralCC 利用病毒-宿主接近结构作为 Hi-C 相互作用的补充信息源。使用来自几个不同微生物生态系统(包括人类肠道、牛粪便和废水)的模拟和真实宏基因组 Hi-C 数据集,我们证明了 ViralCC 优于现有的基于 Hi-C 的分箱方法以及专门用于宏基因组病毒分箱的最新工具。ViralCC 还可以揭示微生物群落中病毒和病毒-宿主对的分类结构。当应用于真实的废水宏基因组 Hi-C 数据集时,ViralCC 构建了一个噬菌体-宿主网络,进一步使用 CRISPR 间隔分析进行验证。ViralCC 是一个可在 https://github.com/dyxstat/ViralCC 上获得的开源流水线。