文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

基于内插马尔可夫模型的宏基因组序列聚类。

Clustering metagenomic sequences with interpolated Markov models.

机构信息

Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, College Park, MD 20742, USA.

出版信息

BMC Bioinformatics. 2010 Nov 2;11:544. doi: 10.1186/1471-2105-11-544.


DOI:10.1186/1471-2105-11-544
PMID:21044341
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3098094/
Abstract

BACKGROUND: Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects. RESULTS: We present SCIMM (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method. SCIMM achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHYSCIMM that performs better when evolutionarily close training genomes are available. CONCLUSIONS: SCIMM and PHYSCIMM are highly accurate methods to cluster metagenomic sequences. SCIMM operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. PHYSCIMM uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. SCIMM and PHYSCIMM are available open source from http://www.cbcb.umd.edu/software/scimm.

摘要

背景:环境 DNA 测序(通常称为宏基因组学)具有揭示大量无法通过传统方法培养和测序的未知微生物的巨大潜力。由于宏基因组测序的输出是一组未知来源的大量读取序列,因此将来自同一物种的测序读取序列聚类在一起是至关重要的分析步骤。许多有效的方法依赖于公共数据库中的测序基因组,但这些基因组是一个高度偏向的样本,不一定能代表许多宏基因组学项目感兴趣的环境。

结果:我们提出了 SCIMM(基于插值马尔可夫模型的序列聚类),这是一种无监督的序列聚类方法。SCIMM 实现了比以前的无监督方法更高的聚类准确性。我们研究了无监督学习在复杂数据集上的局限性,并提出了一种 SCIMM 和监督学习方法 Phymm 的混合方法 PHYSCIMM,当有进化上接近的训练基因组时,它的性能更好。

结论:SCIMM 和 PHYSCIMM 是高度准确的宏基因组序列聚类方法。SCIMM 完全无监督,非常适合主要包含新型微生物的环境。PHYSCIMM 使用监督学习来提高在包含特征明确属的微生物菌株的环境中的聚类效果。SCIMM 和 PHYSCIMM 可从 http://www.cbcb.umd.edu/software/scimm 获得开源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba4a/3098094/5d9eb8d5e556/1471-2105-11-544-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba4a/3098094/b53bb8af963f/1471-2105-11-544-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba4a/3098094/45b667e36cc8/1471-2105-11-544-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba4a/3098094/3ba1878a6a87/1471-2105-11-544-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba4a/3098094/8d69c5eacdf6/1471-2105-11-544-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba4a/3098094/5d9eb8d5e556/1471-2105-11-544-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba4a/3098094/b53bb8af963f/1471-2105-11-544-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba4a/3098094/45b667e36cc8/1471-2105-11-544-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba4a/3098094/3ba1878a6a87/1471-2105-11-544-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba4a/3098094/8d69c5eacdf6/1471-2105-11-544-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba4a/3098094/5d9eb8d5e556/1471-2105-11-544-5.jpg

相似文献

[1]
Clustering metagenomic sequences with interpolated Markov models.

BMC Bioinformatics. 2010-11-2

[2]
MBMC: An Effective Markov Chain Approach for Binning Metagenomic Reads from Environmental Shotgun Sequencing Projects.

OMICS. 2016-8

[3]
Binning Metagenomic Contigs Using Unsupervised Clustering and Reference Databases.

Interdiscip Sci. 2022-12

[4]
A New Unsupervised Binning Approach for Metagenomic Sequences Based on N-grams and Automatic Feature Weighting.

IEEE/ACM Trans Comput Biol Bioinform. 2014

[5]
MBBC: an efficient approach for metagenomic binning based on clustering.

BMC Bioinformatics. 2015-2-5

[6]
Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets.

BMC Bioinformatics. 2020-7-28

[7]
MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets.

Genomics. 2014

[8]
Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models.

Nat Methods. 2009-9

[9]
MetaCRS: unsupervised clustering of contigs with the recursive strategy of reducing metagenomic dataset's complexity.

BMC Bioinformatics. 2022-1-20

[10]
MetaBinG: using GPUs to accelerate metagenomic sequence classification.

PLoS One. 2011-11-23

引用本文的文献

[1]
Solving genomic puzzles: computational methods for metagenomic binning.

Brief Bioinform. 2024-7-25

[2]
Step-by-Step Metagenomics for Food Microbiome Analysis: A Detailed Review.

Foods. 2024-7-14

[3]
Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters.

Front Bioinform. 2023-3-3

[4]
MetaConClust - Unsupervised Binning of Metagenomics Data using Consensus Clustering.

Curr Genomics. 2022-6-10

[5]
Introduction to the principles and methods underlying the recovery of metagenome-assembled genomes from metagenomic data.

Microbiologyopen. 2022-6

[6]
Music of metagenomics-a review of its applications, analysis pipeline, and associated tools.

Funct Integr Genomics. 2022-2

[7]
Antibiotic resistance: Time of synthesis in a post-genomic age.

Comput Struct Biotechnol J. 2021-5-21

[8]
Improving metagenomic binning results with overlapped bins using assembly graphs.

Algorithms Mol Biol. 2021-5-4

[9]
Species complex delimitations in the genus : A machine learning approach for cluster discovery.

Appl Plant Sci. 2020-7-31

[10]
Metagenomic approaches in microbial ecology: an update on whole-genome and marker gene sequencing analyses.

Microb Genom. 2020-8

本文引用的文献

[1]
A novel abundance-based algorithm for binning metagenomic sequences using l-tuples.

J Comput Biol. 2011-3

[2]
Metagenomic sequencing of an in vitro-simulated microbial community.

PLoS One. 2010-4-16

[3]
Finding biologically accurate clusterings in hierarchical tree decompositions using the variation of information.

J Comput Biol. 2010-3

[4]
Alignment and clustering of phylogenetic markers--implications for microbial diversity studies.

BMC Bioinformatics. 2010-3-24

[5]
A human gut microbial gene catalogue established by metagenomic sequencing.

Nature. 2010-3-4

[6]
Viral and microbial community dynamics in four aquatic environments.

ISME J. 2010-2-11

[7]
A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea.

Nature. 2009-12-24

[8]
WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads.

BMC Bioinformatics. 2009-12-18

[9]
Exceptional structured noncoding RNAs revealed by bacterial metagenome analysis.

Nature. 2009-12-3

[10]
The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata.

Nucleic Acids Res. 2009-11-13

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索