用于宏基因组序列分析的超快聚类算法。

Ultrafast clustering algorithms for metagenomic sequence analysis.

机构信息

Center for Research in Biological Systems, University of California San Diego, USA.

出版信息

Brief Bioinform. 2012 Nov;13(6):656-68. doi: 10.1093/bib/bbs035. Epub 2012 Jul 6.

DOI:10.1093/bib/bbs035

PMID:22772836

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3504929/

Abstract

The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters.

摘要

高通量测序技术的快速发展极大地推动了对存在于各种环境中的微生物群落的宏基因组研究。宏基因组学中的基本问题包括微生物种群的身份、组成和动态及其功能和相互作用。然而，这些序列数据的海量和综合复杂性在数据分析方面带来了巨大的挑战。这些挑战包括但不限于不断增加的计算需求、序列采样偏差、序列错误、序列伪影和新序列。序列聚类方法可以通过将相似的序列分组到家族中，直接回答许多基本问题。此外，聚类分析也解决了宏基因组学中的挑战。因此，大量的冗余数据集可以用一个小的非冗余集来表示，其中每个聚类都可以用单个条目或共识来表示。通过聚类可以快速检测伪影。可以通过使用聚类中序列的共识来识别、过滤或纠正错误。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d259/3504929/7427950d74f7/bbs035f1.jpg

相似文献

Ultrafast clustering algorithms for metagenomic sequence analysis.用于宏基因组序列分析的超快聚类算法。

Brief Bioinform. 2012 Nov;13(6):656-68. doi: 10.1093/bib/bbs035. Epub 2012 Jul 6.

Analysis and comparison of very large metagenomes with fast clustering and functional annotation.快速聚类和功能注释的超大宏基因组分析与比较。

BMC Bioinformatics. 2009 Oct 28;10:359. doi: 10.1186/1471-2105-10-359.

Estimating the composition of species in metagenomes by clustering of next-generation read sequences.通过对新一代测序读段序列进行聚类来估计宏基因组中物种的组成。

Methods. 2014 Oct 1;69(3):213-9. doi: 10.1016/j.ymeth.2014.07.009. Epub 2014 Jul 27.

OGRE: Overlap Graph-based metagenomic Read clustEring.OGRE：基于重叠图的宏基因组读聚类。

Bioinformatics. 2021 May 17;37(7):905-912. doi: 10.1093/bioinformatics/btaa760.

Binning Metagenomic Contigs Using Unsupervised Clustering and Reference Databases.使用无监督聚类和参考数据库对宏基因组重叠群进行分箱

Interdiscip Sci. 2022 Dec;14(4):795-803. doi: 10.1007/s12539-022-00526-y. Epub 2022 May 31.

MetaCRS: unsupervised clustering of contigs with the recursive strategy of reducing metagenomic dataset's complexity.MetaCRS：一种具有递归策略的无监督组装体聚类方法，用于降低宏基因组数据集的复杂度。

BMC Bioinformatics. 2022 Jan 20;22(Suppl 12):315. doi: 10.1186/s12859-021-04227-z.

MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets.MetaCAA：一种用于宏基因组数据集高效组装的聚类辅助方法。

Genomics. 2014 Feb-Mar;103(2-3):161-8. doi: 10.1016/j.ygeno.2014.02.007. Epub 2014 Mar 5.

Metagenome sequence clustering with hash-based canopies.基于哈希冠层的宏基因组序列聚类。

J Bioinform Comput Biol. 2017 Dec;15(6):1740006. doi: 10.1142/S0219720017400066. Epub 2017 Oct 9.

Metagenomic Assembly: Overview, Challenges and Applications.宏基因组组装：概述、挑战与应用

Yale J Biol Med. 2016 Sep 30;89(3):353-362. eCollection 2016 Sep.

Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes.通过验证的视角看宏基因组组装：评估和提高宏基因组组装基因组质量的最新进展。

Brief Bioinform. 2019 Jul 19;20(4):1140-1150. doi: 10.1093/bib/bbx098.

引用本文的文献

Secretory IgA dysfunction underlies poor prognosis in -infected colorectal cancer.分泌型IgA功能障碍是感染性结直肠癌预后不良的潜在原因。

Gut Microbes. 2025 Dec;17(1):2528428. doi: 10.1080/19490976.2025.2528428. Epub 2025 Jul 16.

Arctic Ocean virus communities and their seasonality, bipolarity, and prokaryotic associations.北冰洋病毒群落及其季节性、两极分布和原核生物关联。

Nat Commun. 2025 Jul 11;16(1):6427. doi: 10.1038/s41467-025-61568-6.

Charting γ-secretase substrates by explainable AI.通过可解释人工智能绘制γ-分泌酶底物图谱。

Nat Commun. 2025 Jul 1;16(1):5428. doi: 10.1038/s41467-025-60638-z.

A BERT-based rice enhancer identification model combined with sequence-representation differential entropy interpretation.一种基于BERT的水稻增强子识别模型与序列表征微分熵解释相结合。

Front Plant Sci. 2025 Jun 9;16:1618174. doi: 10.3389/fpls.2025.1618174. eCollection 2025.

Pike: OTU-Level Analysis for Oxford Nanopore Amplicon Metagenomics.派克：牛津纳米孔扩增子宏基因组学的操作分类单元（OTU）水平分析

Int J Mol Sci. 2025 Apr 28;26(9):4168. doi: 10.3390/ijms26094168.

Impact of dietary polyphenols from shredded, steam-exploded pine on growth performance, organ indices, meat quality, and cecal microbiota of broiler chickens.切碎、蒸汽爆破松木中的膳食多酚对肉鸡生长性能、器官指数、肉质和盲肠微生物群的影响。

Poult Sci. 2025 May;104(5):105088. doi: 10.1016/j.psj.2025.105088. Epub 2025 Mar 22.

Alleviation of Acute Heat Stress in Broiler Chickens by Dietary Supplementation of Polyphenols from Shredded, Steam-Exploded Pine Particles.日粮添加经粉碎、蒸汽爆破处理的松树颗粒中的多酚对缓解肉鸡急性热应激的作用

Microorganisms. 2025 Jan 22;13(2):235. doi: 10.3390/microorganisms13020235.

Microbiome of (Ixodida: Ixodidae) Ticks: Variation in Community Structure with Regard to Sex and Host Habitat.蜱（蜱螨目：硬蜱科）的微生物组：群落结构在性别和宿主栖息地方面的差异

Insects. 2024 Dec 27;16(1):11. doi: 10.3390/insects16010011.

Decomposition of the pangenome matrix reveals a structure in gene distribution in the species.泛基因组矩阵的分解揭示了该物种基因分布的一种结构。

mSphere. 2025 Jan 28;10(1):e0053224. doi: 10.1128/msphere.00532-24. Epub 2024 Dec 31.

Low electric current in a bioelectrochemical system facilitates ethanol production from CO using CO-enriched mixed culture.生物电化学系统中的低电流有助于利用富含一氧化碳的混合培养物从一氧化碳生产乙醇。

Front Microbiol. 2024 Aug 29;15:1438758. doi: 10.3389/fmicb.2024.1438758. eCollection 2024.

本文引用的文献

KABOOM! A new suffix array based algorithm for clustering expression data.砰！一种新的基于后缀数组的聚类表达数据算法。

Bioinformatics. 2011 Dec 15;27(24):3348-55. doi: 10.1093/bioinformatics/btr560. Epub 2011 Oct 8.

Efficient de novo assembly of single-cell bacterial genomes from short-read data sets.基于短读长数据集的高效从头组装单细胞细菌基因组。

Nat Biotechnol. 2011 Sep 18;29(10):915-21. doi: 10.1038/nbt.1966.

WebMGA: a customizable web server for fast metagenomic sequence analysis.WebMGA：一个可定制的快速宏基因组序列分析网络服务器。

BMC Genomics. 2011 Sep 7;12:444. doi: 10.1186/1471-2164-12-444.

SEED: efficient clustering of next-generation sequences.SEED：下一代序列的高效聚类。

Bioinformatics. 2011 Sep 15;27(18):2502-9. doi: 10.1093/bioinformatics/btr447. Epub 2011 Aug 2.

DNACLUST: accurate and efficient clustering of phylogenetic marker genes.DNACLUST：准确高效的系统发育标记基因聚类

BMC Bioinformatics. 2011 Jun 30;12:271. doi: 10.1186/1471-2105-12-271.

Error correction of high-throughput sequencing datasets with non-uniform coverage.利用非均匀覆盖的高通量测序数据集进行纠错。

Bioinformatics. 2011 Jul 1;27(13):i137-41. doi: 10.1093/bioinformatics/btr208.

RAPSearch: a fast protein similarity search tool for short reads.RAPSearch：一种用于短读长的快速蛋白质相似性搜索工具。

BMC Bioinformatics. 2011 May 15;12:159. doi: 10.1186/1471-2105-12-159.

A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis.大规模基准研究现有的分类学独立微生物群落分析算法。

Brief Bioinform. 2012 Jan;13(1):107-21. doi: 10.1093/bib/bbr009. Epub 2011 Apr 27.

FR-HIT, a very fast program to recruit metagenomic reads to homologous reference genomes.FR-HIT，一个快速招募宏基因组reads 到同源参考基因组的程序。

Bioinformatics. 2011 Jun 15;27(12):1704-5. doi: 10.1093/bioinformatics/btr252. Epub 2011 Apr 19.

ECHO: a reference-free short-read error correction algorithm.ECHO：一种无参考的短读错误纠正算法。

Genome Res. 2011 Jul;21(7):1181-92. doi: 10.1101/gr.111351.110. Epub 2011 Apr 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于宏基因组序列分析的超快聚类算法。

Ultrafast clustering algorithms for metagenomic sequence analysis.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献