Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.
Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Shanghai, China.
Nat Commun. 2022 Apr 28;13(1):2326. doi: 10.1038/s41467-022-29843-y.
Metagenomic binning is the step in building metagenome-assembled genomes (MAGs) when sequences predicted to originate from the same genome are automatically grouped together. The most widely-used methods for binning are reference-independent, operating de novo and enable the recovery of genomes from previously unsampled clades. However, they do not leverage the knowledge in existing databases. Here, we introduce SemiBin, an open source tool that uses deep siamese neural networks to implement a semi-supervised approach, i.e. SemiBin exploits the information in reference genomes, while retaining the capability of reconstructing high-quality bins that are outside the reference dataset. Using simulated and real microbiome datasets from several different habitats from GMGCv1 (Global Microbial Gene Catalog), including the human gut, non-human guts, and environmental habitats (ocean and soil), we show that SemiBin outperforms existing state-of-the-art binning methods. In particular, compared to other methods, SemiBin returns more high-quality bins with larger taxonomic diversity, including more distinct genera and species.
宏基因组 binning 是构建宏基因组组装基因组 (MAGs) 的步骤,此时预测源自同一基因组的序列会自动被归为一组。目前最广泛使用的 binning 方法是不依赖于参考序列的,它们是从头开始运行的,能够从以前未采样的进化枝中恢复基因组。然而,这些方法并不能利用现有数据库中的知识。在这里,我们介绍了 SemiBin,这是一个开源工具,它使用深度孪生神经网络来实现半监督方法,即 SemiBin 利用参考基因组中的信息,同时保留重建高质量 bin 的能力,这些 bin 不在参考数据集内。我们使用来自 GMGCv1(全球微生物基因目录)的几个不同生境的模拟和真实微生物组数据集,包括人类肠道、非人类肠道和环境生境(海洋和土壤),表明 SemiBin 优于现有的最先进的 binning 方法。特别是与其他方法相比,SemiBin 返回了更多具有更大分类多样性的高质量 bin,包括更多独特的属和种。