CH-Bin：一种基于凸壳的宏基因组 contigs 分箱方法。

CH-Bin: A convex hull based approach for binning metagenomic contigs.

机构信息

Department of Computer Science and Engineering, University of Moratuwa, Bandaranayake Mawatha, Moratuwa 10400, Sri Lanka.

School of Computing, Australian National University, Canberra ACT 2600, Australia; Flinders Accelerator for Microbiome Exploration, Flinders University, Bedford Park SA 5042, Australia.

出版信息

Comput Biol Chem. 2022 Oct;100:107734. doi: 10.1016/j.compbiolchem.2022.107734. Epub 2022 Jul 14.

DOI:10.1016/j.compbiolchem.2022.107734

PMID:35964419

Abstract

Metagenomics has enabled culture-independent analysis of micro-organisms present in environmental samples. Metagenomics binning, which involves the grouping of contigs into bins that represent different taxonomic groups, is an important step of a typical metagenomic workflow followed after assembly. The majority of the metagenomic binning tools represent the composition and coverage information of contigs as feature vectors consisting of a large number of dimensions. However, these tools use traditional Euclidean distance or Manhattan distance metrics which become unreliable in the high dimensional space. We propose CH-Bin, a binning approach that leverages the benefits of using convex hull distance for binning contigs represented by high dimensional feature vectors. We demonstrate using experimental evidence on simulated and real datasets that the use of high dimensional feature vectors to represent contigs can preserve additional information, and result in improved binning results. We further demonstrate that the convex hull distance based binning approach can be effectively utilized in binning such high dimensional data. To the best of our knowledge, this is the first time that composition information from oligonucleotides of multiple sizes has been used in representing the composition information of contigs and a convex hull distance based binning algorithm has been used to bin metagenomic contigs. The source code of CH-Bin is available at https://github.com/kdsuneraavinash/CH-Bin.

摘要

宏基因组学使对环境样本中存在的微生物进行非培养分析成为可能。宏基因组binning 是典型宏基因组工作流程中组装后的一个重要步骤，它涉及将 contigs 分组到代表不同分类群的 bins 中。大多数宏基因组 binning 工具将 contigs 的组成和覆盖信息表示为由大量维度组成的特征向量。然而，这些工具使用传统的欧几里得距离或曼哈顿距离度量，在高维空间中变得不可靠。我们提出了 CH-Bin，这是一种利用凸壳距离进行 binning 的方法，用于对由高维特征向量表示的 contigs 进行 binning。我们通过在模拟和真实数据集上的实验证据证明，使用高维特征向量来表示 contigs 可以保留额外的信息，并获得更好的 binning 结果。我们进一步证明，基于凸壳距离的 binning 方法可以有效地用于 binning 这种高维数据。据我们所知，这是第一次使用多种大小的寡核苷酸的组成信息来表示 contigs 的组成信息，并且使用基于凸壳距离的 binning 算法来 bin 宏基因组 contigs。CH-Bin 的源代码可在 https://github.com/kdsuneraavinash/CH-Bin 上获得。

相似文献

CH-Bin: A convex hull based approach for binning metagenomic contigs.

Comput Biol Chem. 2022 Oct;100:107734. doi: 10.1016/j.compbiolchem.2022.107734. Epub 2022 Jul 14.

Accurate Binning of Metagenomic Contigs Using Composition, Coverage, and Assembly Graphs.

J Comput Biol. 2022 Dec;29(12):1357-1376. doi: 10.1089/cmb.2022.0262. Epub 2022 Nov 11.

GraphBin: refined binning of metagenomic contigs using assembly graphs.

Bioinformatics. 2020 Jun 1;36(11):3307-3313. doi: 10.1093/bioinformatics/btaa180.

HiFine: integrating Hi-C-based and shotgun-based methods to refine binning of metagenomic contigs.

Bioinformatics. 2022 May 26;38(11):2973-2979. doi: 10.1093/bioinformatics/btac295.

CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision.

BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):571. doi: 10.1186/s12859-017-1967-3.

Improving contig binning of metagenomic data using [Formula: see text] oligonucleotide frequency dissimilarity.

BMC Bioinformatics. 2017 Sep 20;18(1):425. doi: 10.1186/s12859-017-1835-1.

AFITbin: a metagenomic contig binning method using aggregate l-mer frequency based on initial and terminal nucleotides.

BMC Bioinformatics. 2024 Jul 16;25(1):241. doi: 10.1186/s12859-024-05859-7.

COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge.

Bioinformatics. 2017 Mar 15;33(6):791-798. doi: 10.1093/bioinformatics/btw290.

Binning Metagenomic Contigs Using Unsupervised Clustering and Reference Databases.

Interdiscip Sci. 2022 Dec;14(4):795-803. doi: 10.1007/s12539-022-00526-y. Epub 2022 May 31.

Unsupervised Binning of Metagenomic Assembled Contigs Using Improved Fuzzy C-Means Method.

IEEE/ACM Trans Comput Biol Bioinform. 2017 Nov-Dec;14(6):1459-1467. doi: 10.1109/TCBB.2016.2576452. Epub 2016 Jun 7.

引用本文的文献

Solving genomic puzzles: computational methods for metagenomic binning.

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae372.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

CH-Bin：一种基于凸壳的宏基因组 contigs 分箱方法。

CH-Bin: A convex hull based approach for binning metagenomic contigs.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献