• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种具有任意物种丰度比的宏基因组序列的健壮且准确的分箱算法。

A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio.

机构信息

Department of Computer Science, The University of Hong Kong, Hong Kong.

出版信息

Bioinformatics. 2011 Jun 1;27(11):1489-95. doi: 10.1093/bioinformatics/btr186. Epub 2011 Apr 14.

DOI:10.1093/bioinformatics/btr186
PMID:21493653
Abstract

MOTIVATION

With the rapid development of next-generation sequencing techniques, metagenomics, also known as environmental genomics, has emerged as an exciting research area that enables us to analyze the microbial environment in which we live. An important step for metagenomic data analysis is the identification and taxonomic characterization of DNA fragments (reads or contigs) resulting from sequencing a sample of mixed species. This step is referred to as 'binning'. Binning algorithms that are based on sequence similarity and sequence composition markers rely heavily on the reference genomes of known microorganisms or phylogenetic markers. Due to the limited availability of reference genomes and the bias and low availability of markers, these algorithms may not be applicable in all cases. Unsupervised binning algorithms which can handle fragments from unknown species provide an alternative approach. However, existing unsupervised binning algorithms only work on datasets either with balanced species abundance ratios or rather different abundance ratios, but not both.

RESULTS

In this article, we present MetaCluster 3.0, an integrated binning method based on the unsupervised top--down separation and bottom--up merging strategy, which can bin metagenomic fragments of species with very balanced abundance ratios (say 1:1) to very different abundance ratios (e.g. 1:24) with consistently higher accuracy than existing methods.

AVAILABILITY

MetaCluster 3.0 can be downloaded at http://i.cs.hku.hk/~alse/MetaCluster/.

摘要

动机

随着下一代测序技术的快速发展,宏基因组学,也称为环境基因组学,已经成为一个令人兴奋的研究领域,使我们能够分析我们生活的微生物环境。宏基因组数据分析的一个重要步骤是识别和分类学特征化来自混合物种样本测序的 DNA 片段(读取或 contigs)。这一步骤称为“分箱”。基于序列相似性和序列组成标记的分箱算法严重依赖于已知微生物或系统发育标记的参考基因组。由于参考基因组的有限可用性以及标记的偏差和低可用性,这些算法可能并不适用于所有情况。可以处理未知物种片段的无监督分箱算法提供了一种替代方法。然而,现有的无监督分箱算法仅适用于具有平衡物种丰度比或相当不同丰度比的数据集,但不适用于两者。

结果

在本文中,我们提出了 MetaCluster 3.0,这是一种基于无监督自上而下的分离和自下而上的合并策略的集成分箱方法,它可以对丰度比非常平衡(例如 1:1)到非常不同(例如 1:24)的物种的宏基因组片段进行分箱,其准确性始终高于现有方法。

可用性

MetaCluster 3.0 可在 http://i.cs.hku.hk/~alse/MetaCluster/ 下载。

相似文献

1
A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio.一种具有任意物种丰度比的宏基因组序列的健壮且准确的分箱算法。
Bioinformatics. 2011 Jun 1;27(11):1489-95. doi: 10.1093/bioinformatics/btr186. Epub 2011 Apr 14.
2
Unsupervised binning of environmental genomic fragments based on an error robust selection of l-mers.基于 l-mers 稳健选择的无监督环境基因组片段分箱。
BMC Bioinformatics. 2010 Apr 16;11 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-11-S2-S5.
3
MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample.MetaCluster 5.0:一种针对嘈杂样本中低丰度物种的元基因组数据的两阶段分箱方法。
Bioinformatics. 2012 Sep 15;28(18):i356-i362. doi: 10.1093/bioinformatics/bts397.
4
A New Unsupervised Binning Approach for Metagenomic Sequences Based on N-grams and Automatic Feature Weighting.一种基于N元语法和自动特征加权的宏基因组序列无监督分箱新方法。
IEEE/ACM Trans Comput Biol Bioinform. 2014 Jan-Feb;11(1):42-54. doi: 10.1109/TCBB.2013.137.
5
MBMC: An Effective Markov Chain Approach for Binning Metagenomic Reads from Environmental Shotgun Sequencing Projects.MBMC:一种用于对环境鸟枪法测序项目中的宏基因组读数进行分箱的有效马尔可夫链方法。
OMICS. 2016 Aug;20(8):470-9. doi: 10.1089/omi.2016.0081. Epub 2016 Jul 22.
6
MetaCluster 4.0: a novel binning algorithm for NGS reads and huge number of species.MetaCluster 4.0:一种用于NGS读数和大量物种的新型分箱算法。
J Comput Biol. 2012 Feb;19(2):241-9. doi: 10.1089/cmb.2011.0276.
7
MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning.MetaCluster-TA:基于组装辅助分箱的宏基因组数据分类注释。
BMC Genomics. 2014;15 Suppl 1(Suppl 1):S12. doi: 10.1186/1471-2164-15-S1-S12. Epub 2014 Jan 24.
8
A novel abundance-based algorithm for binning metagenomic sequences using l-tuples.一种基于丰度的新型算法,用于使用l元组对宏基因组序列进行分箱。
J Comput Biol. 2011 Mar;18(3):523-34. doi: 10.1089/cmb.2010.0245.
9
TWARIT: an extremely rapid and efficient approach for phylogenetic classification of metagenomic sequences.TWARIT:一种用于宏基因组序列系统发育分类的极快速有效的方法。
Gene. 2012 Sep 1;505(2):259-65. doi: 10.1016/j.gene.2012.06.014. Epub 2012 Jun 15.
10
CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision.CoMet:一种使用 contig 覆盖度和组成进行宏基因组样本高精度分箱的工作流程。
BMC Bioinformatics. 2017 Dec 28;18(Suppl 16):571. doi: 10.1186/s12859-017-1967-3.

引用本文的文献

1
Solving genomic puzzles: computational methods for metagenomic binning.解决基因组难题:宏基因组 binning 的计算方法。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae372.
2
Metagenomic-based surveillance systems for antibiotic resistance in non-clinical settings.非临床环境中基于宏基因组学的抗生素耐药性监测系统。
Front Microbiol. 2022 Dec 2;13:1066995. doi: 10.3389/fmicb.2022.1066995. eCollection 2022.
3
Alignment-Free Sequence Analysis and Applications.无比对序列分析及其应用
Annu Rev Biomed Data Sci. 2018 Jul;1:93-114. doi: 10.1146/annurev-biodatasci-080917-013431. Epub 2018 Apr 25.
4
MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage.MetaCon:基于概率 k- -mer 统计和覆盖度的无监督宏基因组序列聚类
BMC Bioinformatics. 2019 Nov 22;20(Suppl 9):367. doi: 10.1186/s12859-019-2904-4.
5
A high-resolution genomic composition-based method with the ability to distinguish similar bacterial organisms.一种基于高分辨率基因组组成的方法,具有区分相似细菌的能力。
BMC Genomics. 2019 Oct 21;20(1):754. doi: 10.1186/s12864-019-6119-x.
6
FEAST: fast expectation-maximization for microbial source tracking.FEAST:用于微生物溯源的快速期望最大化算法。
Nat Methods. 2019 Jul;16(7):627-632. doi: 10.1038/s41592-019-0431-x. Epub 2019 Jun 10.
7
Metagenomics Investigation of Agarlytic Genes and Genomes in Mangrove Sediments in China: A Potential Repertory for Carbohydrate-Active Enzymes.中国红树林沉积物中琼脂分解基因和基因组的宏基因组学研究:碳水化合物活性酶的潜在宝库
Front Microbiol. 2018 Aug 14;9:1864. doi: 10.3389/fmicb.2018.01864. eCollection 2018.
8
CoreProbe: A Novel Algorithm for Estimating Relative Abundance Based on Metagenomic Reads.核心探针:一种基于宏基因组读数估计相对丰度的新算法。
Genes (Basel). 2018 Jun 20;9(6):313. doi: 10.3390/genes9060313.
9
When old metagenomic data meet newly sequenced genomes, a case study.当古老的宏基因组数据遇到新测序的基因组时:一个案例研究。
PLoS One. 2018 Jun 14;13(6):e0198773. doi: 10.1371/journal.pone.0198773. eCollection 2018.
10
Loeffler 4.0: Diagnostic Metagenomics.莱夫勒 4.0:诊断宏基因组学。
Adv Virus Res. 2017;99:17-37. doi: 10.1016/bs.aivir.2017.08.001. Epub 2017 Sep 21.