MetaCluster 5.0：一种针对嘈杂样本中低丰度物种的元基因组数据的两阶段分箱方法。

MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample.

机构信息

Department of Computer Science, The University of Hong Kong, Hong Kong.

出版信息

Bioinformatics. 2012 Sep 15;28(18):i356-i362. doi: 10.1093/bioinformatics/bts397.

DOI:10.1093/bioinformatics/bts397

PMID:22962452

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3436824/

Abstract

MOTIVATION

Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets. Binning methods that can solve these problems are desirable.

RESULTS

We proposed a two-round binning method (MetaCluster 5.0) that aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due to many extremely low-abundance species. In summary, MetaCluster 5.0 uses a filtering strategy to remove noise from the extremely low-abundance species. It separate reads of high-abundance species from those of low-abundance species in two different rounds. To overcome the issue of low coverage for low-abundance species, multiple w values are used to group reads with overlapping w-mers, whereas reads from high-abundance species are grouped with high confidence based on a large w and then binning expands to low-abundance species using a relaxed (shorter) w. Compared to the recent tools, TOSS and MetaCluster 4.0, MetaCluster 5.0 can find more species (especially those with low abundance of say 6× to 10×) and can achieve better sensitivity and specificity using less memory and running time.

AVAILABILITY

http://i.cs.hku.hk/~alse/MetaCluster/

CONTACT

chin@cs.hku.hk.

摘要

动机

宏基因组bin 仍然是宏基因组分析中的一个重要课题。现有的下一代测序（NGS）reads 无监督 bin 方法在以下两种情况下表现不佳：（i）低丰度物种的样本；（ii）即使丰度较高，但存在大量极低丰度物种的样本。这两个问题在真实的宏基因组数据集中很常见。因此，我们需要能够解决这些问题的 bin 方法。

结果

我们提出了一种两阶段 bin 方法（MetaCluster 5.0），旨在识别存在大量由大量极低丰度物种引起的噪声的情况下的低丰度和高丰度物种。简而言之，MetaCluster 5.0 使用过滤策略来消除极低丰度物种的噪声。它在两个不同的轮次中，将高丰度物种的reads 与低丰度物种的reads 分开。为了克服低丰度物种覆盖度低的问题，使用多个 w 值将具有重叠 w-mers 的reads 分组，而高丰度物种的reads 则基于较大的 w 值以高置信度分组，然后使用宽松（较短）的 w 值将 bin 扩展到低丰度物种。与最近的工具 TOSS 和 MetaCluster 4.0 相比，MetaCluster 5.0 可以发现更多的物种（特别是那些丰度低至 6×到 10×的物种），并且使用更少的内存和运行时间可以实现更好的灵敏度和特异性。

可用性

http://i.cs.hku.hk/~alse/MetaCluster/

联系方式

chin@cs.hku.hk。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55eb/3436824/2c37c2fcecec/bts397f1.jpg

相似文献

MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample.MetaCluster 5.0：一种针对嘈杂样本中低丰度物种的元基因组数据的两阶段分箱方法。

Bioinformatics. 2012 Sep 15;28(18):i356-i362. doi: 10.1093/bioinformatics/bts397.

A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio.一种具有任意物种丰度比的宏基因组序列的健壮且准确的分箱算法。

Bioinformatics. 2011 Jun 1;27(11):1489-95. doi: 10.1093/bioinformatics/btr186. Epub 2011 Apr 14.

Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.基于 l-mers 稳健选择的无监督环境基因组片段分箱。

BMC Bioinformatics. 2010 Apr 16;11 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-11-S2-S5.

MetaCluster 4.0: a novel binning algorithm for NGS reads and huge number of species.MetaCluster 4.0：一种用于NGS读数和大量物种的新型分箱算法。

J Comput Biol. 2012 Feb;19(2):241-9. doi: 10.1089/cmb.2011.0276.

MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning.MetaCluster-TA：基于组装辅助分箱的宏基因组数据分类注释。

BMC Genomics. 2014;15 Suppl 1(Suppl 1):S12. doi: 10.1186/1471-2164-15-S1-S12. Epub 2014 Jan 24.

A New Unsupervised Binning Approach for Metagenomic Sequences Based on N-grams and Automatic Feature Weighting.一种基于N元语法和自动特征加权的宏基因组序列无监督分箱新方法。

IEEE/ACM Trans Comput Biol Bioinform. 2014 Jan-Feb;11(1):42-54. doi: 10.1109/TCBB.2013.137.

MBMC: An Effective Markov Chain Approach for Binning Metagenomic Reads from Environmental Shotgun Sequencing Projects.MBMC：一种用于对环境鸟枪法测序项目中的宏基因组读数进行分箱的有效马尔可夫链方法。

OMICS. 2016 Aug;20(8):470-9. doi: 10.1089/omi.2016.0081. Epub 2016 Jul 22.

MetaProb 2: Metagenomic Reads Binning Based on Assembly Using Minimizers and K-Mers Statistics.MetaProb 2：基于组装使用最小化和 K- -mer 统计的宏基因组读取分箱。

J Comput Biol. 2021 Nov;28(11):1052-1062. doi: 10.1089/cmb.2021.0270. Epub 2021 Aug 26.

MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures.MetaProb：基于概率序列特征的准确宏基因组 reads 分箱

Bioinformatics. 2016 Sep 1;32(17):i567-i575. doi: 10.1093/bioinformatics/btw466.

Exploiting topic modeling to boost metagenomic reads binning.利用主题建模来促进宏基因组读数分箱。

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-16-S5-S2. Epub 2015 Mar 18.

引用本文的文献

A review of neural networks for metagenomic binning.宏基因组分箱的神经网络综述。

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf065.

MetaComBin: combining abundances and overlaps for binning metagenomics reads.MetaComBin：结合丰度和重叠以对宏基因组reads进行分箱

Front Bioinform. 2025 Mar 3;5:1504728. doi: 10.3389/fbinf.2025.1504728. eCollection 2025.

Targeted protein evolution in the gut microbiome by diversity-generating retroelements.通过多样性产生逆转录元件在肠道微生物组中进行靶向蛋白质进化。

bioRxiv. 2024 Nov 16:2024.11.15.621889. doi: 10.1101/2024.11.15.621889.

Solving genomic puzzles: computational methods for metagenomic binning.解决基因组难题：宏基因组 binning 的计算方法。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae372.

Exploring high-quality microbial genomes by assembling short-reads with long-range connectivity.通过组装具有长程连接性的短读长来探索高质量的微生物基因组。

Nat Commun. 2024 May 31;15(1):4631. doi: 10.1038/s41467-024-49060-z.

Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters.在蛋白质家族水平上探索微生物功能多样性——从宏基因组序列 reads 到注释的蛋白质簇。

Front Bioinform. 2023 Mar 3;3:1157956. doi: 10.3389/fbinf.2023.1157956. eCollection 2023.

MetaConClust - Unsupervised Binning of Metagenomics Data using Consensus Clustering.MetaConClust——使用一致性聚类对宏基因组学数据进行无监督分箱

Curr Genomics. 2022 Jun 10;23(2):137-146. doi: 10.2174/1389202923666220413114659.

Interpreting alignment-free sequence comparison: what makes a score a good score?解读无比对序列比较：什么样的分数才是好分数？

NAR Genom Bioinform. 2022 Sep 5;4(3):lqac062. doi: 10.1093/nargab/lqac062. eCollection 2022 Sep.

MetaCRS: unsupervised clustering of contigs with the recursive strategy of reducing metagenomic dataset's complexity.MetaCRS：一种具有递归策略的无监督组装体聚类方法，用于降低宏基因组数据集的复杂度。

BMC Bioinformatics. 2022 Jan 20;22(Suppl 12):315. doi: 10.1186/s12859-021-04227-z.

Metagenomic analysis through the extended Burrows-Wheeler transform.基于扩展的 Burrows-Wheeler 变换的宏基因组分析。

BMC Bioinformatics. 2020 Sep 16;21(Suppl 8):299. doi: 10.1186/s12859-020-03628-w.

本文引用的文献

Separating metagenomic short reads into genomes via clustering.通过聚类将宏基因组短读段分离成基因组。

Algorithms Mol Biol. 2012 Sep 26;7(1):27. doi: 10.1186/1748-7188-7-27.

MetaCluster 4.0: a novel binning algorithm for NGS reads and huge number of species.MetaCluster 4.0：一种用于NGS读数和大量物种的新型分箱算法。

J Comput Biol. 2012 Feb;19(2):241-9. doi: 10.1089/cmb.2011.0276.

A novel abundance-based algorithm for binning metagenomic sequences using l-tuples.一种基于丰度的新型算法，用于使用l元组对宏基因组序列进行分箱。

J Comput Biol. 2011 Mar;18(3):523-34. doi: 10.1089/cmb.2010.0245.

Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.基于 l-mers 稳健选择的无监督环境基因组片段分箱。

BMC Bioinformatics. 2010 Apr 16;11 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-11-S2-S5.

A human gut microbial gene catalogue established by metagenomic sequencing.宏基因组测序建立的人类肠道微生物基因目录。

Nature. 2010 Mar 4;464(7285):59-65. doi: 10.1038/nature08821.

Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models.Phymm和PhymmBL：基于插值马尔可夫模型的宏基因组系统发育分类

Nat Methods. 2009 Sep;6(9):673-6. doi: 10.1038/nmeth.1358. Epub 2009 Aug 2.

Predominant role of host genetics in controlling the composition of gut microbiota.宿主遗传学在控制肠道微生物群组成中的主要作用。

PLoS One. 2008 Aug 26;3(8):e3064. doi: 10.1371/journal.pone.0003064.

Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes.环境鸟枪法测序：研究微生物隐秘世界的潜力与挑战

PLoS Biol. 2007 Mar;5(3):e82. doi: 10.1371/journal.pbio.0050082.

Accurate phylogenetic classification of variable-length DNA fragments.可变长度DNA片段的精确系统发育分类。

Nat Methods. 2007 Jan;4(1):63-72. doi: 10.1038/nmeth976. Epub 2006 Dec 10.

Use of 16S rRNA and rpoB genes as molecular markers for microbial ecology studies.使用16S rRNA和rpoB基因作为微生物生态学研究的分子标记。

Appl Environ Microbiol. 2007 Jan;73(1):278-88. doi: 10.1128/AEM.01177-06. Epub 2006 Oct 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

MetaCluster 5.0：一种针对嘈杂样本中低丰度物种的元基因组数据的两阶段分箱方法。

MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

动机

结果

可用性

联系方式

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献