• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BMC3C:基于密码子使用、序列组成和读段覆盖度对宏基因组 contigs 进行分箱。

BMC3C: binning metagenomic contigs using codon usage, sequence composition and read coverage.

机构信息

College of Computer and Information Science, Southwest University, Chongqing, China.

School of Life Sciences and Partner State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.

出版信息

Bioinformatics. 2018 Dec 15;34(24):4172-4179. doi: 10.1093/bioinformatics/bty519.

DOI:10.1093/bioinformatics/bty519
PMID:29947757
Abstract

MOTIVATION

Metagenomics investigates the DNA sequences directly recovered from environmental samples. It often starts with reads assembly, which leads to contigs rather than more complete genomes. Therefore, contig binning methods are subsequently used to bin contigs into genome bins. While some clustering-based binning methods have been developed, they generally suffer from problems related to stability and robustness.

RESULTS

We introduce BMC3C, an ensemble clustering-based method, to accurately and robustly bin contigs by making use of DNA sequence Composition, Coverage across multiple samples and Codon usage. BMC3C begins by searching the proper number of clusters and repeatedly applying the k-means clustering with different initializations to cluster contigs. Next, a weight graph with each node representing a contig is derived from these clusters. If two contigs are frequently grouped into the same cluster, the weight between them is high, and otherwise low. BMC3C finally employs a graph partitioning technique to partition the weight graph into subgraphs, each corresponding to a genome bin. We conduct experiments on both simulated and real-world datasets to evaluate BMC3C, and compare it with the state-of-the-art binning tools. We show that BMC3C has an improved performance compared to these tools. To our knowledge, this is the first time that the codon usage features and ensemble clustering are used in metagenomic contig binning.

AVAILABILITY AND IMPLEMENTATION

The codes of BMC3C are available at http://mlda.swu.edu.cn/codes.php?name=BMC3C.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

宏基因组学直接从环境样本中研究 DNA 序列。它通常从读取组装开始,这导致了 contigs 而不是更完整的基因组。因此,随后使用 contig 分箱方法将 contigs 分箱到基因组 bin 中。虽然已经开发了一些基于聚类的分箱方法,但它们通常存在与稳定性和鲁棒性相关的问题。

结果

我们引入了 BMC3C,这是一种基于集合聚类的方法,通过利用 DNA 序列组成、多个样本的覆盖度和密码子使用情况来准确而稳健地对 contigs 进行分箱。BMC3C 首先搜索适当数量的聚类,并使用不同初始化的 k-means 聚类重复地对 contigs 进行聚类。接下来,从这些聚类中得出一个带有每个节点表示一个 contig 的权重图。如果两个 contigs 经常被分组到同一个聚类中,则它们之间的权重较高,否则权重较低。BMC3C 最后采用图划分技术将权重图划分为子图,每个子图对应一个基因组 bin。我们在模拟和真实数据集上进行实验来评估 BMC3C,并将其与最先进的分箱工具进行比较。我们表明,BMC3C 与这些工具相比具有改进的性能。据我们所知,这是首次在宏基因组 contig 分箱中使用密码子使用特征和集合聚类。

可用性和实现

BMC3C 的代码可在 http://mlda.swu.edu.cn/codes.php?name=BMC3C 获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
BMC3C: binning metagenomic contigs using codon usage, sequence composition and read coverage.BMC3C:基于密码子使用、序列组成和读段覆盖度对宏基因组 contigs 进行分箱。
Bioinformatics. 2018 Dec 15;34(24):4172-4179. doi: 10.1093/bioinformatics/bty519.
2
SolidBin: improving metagenome binning with semi-supervised normalized cut.SolidBin:利用半监督归一化割提高宏基因组 bin 划分。
Bioinformatics. 2019 Nov 1;35(21):4229-4238. doi: 10.1093/bioinformatics/btz253.
3
CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision.CoMet:一种使用 contig 覆盖度和组成进行宏基因组样本高精度分箱的工作流程。
BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):571. doi: 10.1186/s12859-017-1967-3.
4
GraphBin: refined binning of metagenomic contigs using assembly graphs.GraphBin:使用组装图对宏基因组序列进行精细化分箱。
Bioinformatics. 2020 Jun 1;36(11):3307-3313. doi: 10.1093/bioinformatics/btaa180.
5
Accurate Binning of Metagenomic Contigs Using Composition, Coverage, and Assembly Graphs.基于组成、覆盖度和组装图对宏基因组序列进行精确分箱。
J Comput Biol. 2022 Dec;29(12):1357-1376. doi: 10.1089/cmb.2022.0262. Epub 2022 Nov 11.
6
COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge.可口可乐:利用序列组成、读段覆盖度、共比对和双端读段连接对宏基因组重叠群进行分箱。
Bioinformatics. 2017 Mar 15;33(6):791-798. doi: 10.1093/bioinformatics/btw290.
7
Improving contig binning of metagenomic data using [Formula: see text] oligonucleotide frequency dissimilarity.使用[公式:见正文]寡核苷酸频率差异改进宏基因组数据的重叠群分箱
BMC Bioinformatics. 2017 Sep 20;18(1):425. doi: 10.1186/s12859-017-1835-1.
8
HiFine: integrating Hi-C-based and shotgun-based methods to refine binning of metagenomic contigs.HiFine:整合基于 Hi-C 和 shotgun 的方法来优化宏基因组 contigs 的 bin 划分。
Bioinformatics. 2022 May 26;38(11):2973-2979. doi: 10.1093/bioinformatics/btac295.
9
METAMVGL: a multi-view graph-based metagenomic contig binning algorithm by integrating assembly and paired-end graphs.METAMVGL:一种基于多视图图的宏基因组序列拼接 bin 算法,通过整合组装图和配对末端图。
BMC Bioinformatics. 2021 Jul 22;22(Suppl 10):378. doi: 10.1186/s12859-021-04284-4.
10
Improving metagenomic binning results with overlapped bins using assembly graphs.利用组装图通过重叠分箱改进宏基因组分箱结果。
Algorithms Mol Biol. 2021 May 4;16(1):3. doi: 10.1186/s13015-021-00185-6.

引用本文的文献

1
Binning Metagenomic Contigs Using Contig Embedding and Decomposed Tetranucleotide Frequency.利用重叠群嵌入和分解四核苷酸频率对宏基因组重叠群进行分箱
Biology (Basel). 2024 Sep 24;13(10):755. doi: 10.3390/biology13100755.
2
Environmental community transcriptomics: strategies and struggles.环境群落转录组学:策略与挑战
Brief Funct Genomics. 2025 Jan 15;24. doi: 10.1093/bfgp/elae033.
3
Solving genomic puzzles: computational methods for metagenomic binning.解决基因组难题:宏基因组 binning 的计算方法。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae372.
4
BASALT refines binning from metagenomic data and increases resolution of genome-resolved metagenomic analysis.BASALT对宏基因组数据进行分箱优化,并提高基因组解析宏基因组分析的分辨率。
Nat Commun. 2024 Mar 11;15(1):2179. doi: 10.1038/s41467-024-46539-7.
5
Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters.在蛋白质家族水平上探索微生物功能多样性——从宏基因组序列 reads 到注释的蛋白质簇。
Front Bioinform. 2023 Mar 3;3:1157956. doi: 10.3389/fbinf.2023.1157956. eCollection 2023.
6
MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities.MetaBinner:一种高性能、独立的组装分类方法,可从复杂微生物群落中回收单个基因组。
Genome Biol. 2023 Jan 6;24(1):1. doi: 10.1186/s13059-022-02832-6.
7
Metagenomic binning with assembly graph embeddings.基于组装图嵌入的宏基因组 bin 划分。
Bioinformatics. 2022 Sep 30;38(19):4481-4487. doi: 10.1093/bioinformatics/btac557.
8
Binning long reads in metagenomics datasets using composition and coverage information.利用组成和覆盖信息对宏基因组学数据集中的长读段进行分箱。
Algorithms Mol Biol. 2022 Jul 11;17(1):14. doi: 10.1186/s13015-022-00221-z.
9
Introduction to the principles and methods underlying the recovery of metagenome-assembled genomes from metagenomic data.从宏基因组数据中恢复宏基因组组装基因组的原理和方法简介。
Microbiologyopen. 2022 Jun;11(3):e1298. doi: 10.1002/mbo3.1298.
10
Binning Metagenomic Contigs Using Unsupervised Clustering and Reference Databases.使用无监督聚类和参考数据库对宏基因组重叠群进行分箱
Interdiscip Sci. 2022 Dec;14(4):795-803. doi: 10.1007/s12539-022-00526-y. Epub 2022 May 31.