Suppr
超能文献

DMclust，一种基于密度的 OTU 聚类方法，用于准确提取 16S rRNA 序列。

DMclust, a Density-based Modularity Method for Accurate OTU Picking of 16S rRNA Sequences.

机构信息

Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China.

出版信息

Mol Inform. 2017 Dec;36(12). doi: 10.1002/minf.201600059. Epub 2017 Jun 6.

DOI:10.1002/minf.201600059

PMID:28586119

Abstract

Clustering 16S rRNA sequences into operational taxonomic units (OTUs) is a crucial step in analyzing metagenomic data. Although many methods have been developed, how to obtain an appropriate balance between clustering accuracy and computational efficiency is still a major challenge. A novel density-based modularity clustering method, called DMclust, is proposed in this paper to bin 16S rRNA sequences into OTUs with high clustering accuracy. The DMclust algorithm consists of four main phases. It first searches for the sequence dense group defined as n-sequence community, in which the distance between any two sequences is less than a threshold. Then these dense groups are used to construct a weighted network, where dense groups are viewed as nodes, each pair of dense groups is connected by an edge, and the distance of pairwise groups represents the weight of the edge. Then, a modularity-based community detection method is employed to generate the preclusters. Finally, the remaining sequences are assigned to their nearest preclusters to form OTUs. Compared with existing widely used methods, the experimental results on several metagenomic datasets show that DMclust has higher accurate clustering performance with acceptable memory usage.

摘要

将 16S rRNA 序列聚类为操作分类单元 (OTUs) 是分析宏基因组数据的关键步骤。尽管已经开发了许多方法，但如何在聚类准确性和计算效率之间取得适当的平衡仍然是一个主要挑战。本文提出了一种新的基于密度的模块聚类方法 DMclust，用于将 16S rRNA 序列聚类为具有高聚类准确性的 OTUs。DMclust 算法由四个主要阶段组成。它首先搜索定义为 n-序列社区的序列密集组，其中任意两个序列之间的距离小于阈值。然后，这些密集组用于构建加权网络，其中密集组被视为节点，每个密集组对之间通过边连接，并且对组的距离表示边的权重。然后，使用基于模块性的社区检测方法生成预聚类。最后，将剩余的序列分配给它们最近的预聚类以形成 OTUs。与现有的广泛使用的方法相比，在几个宏基因组数据集上的实验结果表明，DMclust 具有更高的准确聚类性能和可接受的内存使用。

相似文献

DMclust, a Density-based Modularity Method for Accurate OTU Picking of 16S rRNA Sequences.

Mol Inform. 2017 Dec;36(12). doi: 10.1002/minf.201600059. Epub 2017 Jun 6.

MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs.

Mol Biosyst. 2015 Jul;11(7):1907-13. doi: 10.1039/c5mb00089k.

M-pick, a modularity-based method for OTU picking of 16S rRNA sequences.

BMC Bioinformatics. 2013 Feb 7;14:43. doi: 10.1186/1471-2105-14-43.

DBH: A de Bruijn graph-based heuristic method for clustering large-scale 16S rRNA sequences into OTUs.

J Theor Biol. 2017 Jul 21;425:80-87. doi: 10.1016/j.jtbi.2017.04.019. Epub 2017 Apr 26.

hc-OTU: A Fast and Accurate Method for Clustering Operational Taxonomic Units Based on Homopolymer Compaction.

IEEE/ACM Trans Comput Biol Bioinform. 2018 Mar-Apr;15(2):441-451. doi: 10.1109/TCBB.2016.2535326. Epub 2016 Feb 26.

A De Novo Robust Clustering Approach for Amplicon-Based Sequence Data.

J Comput Biol. 2019 Jun;26(6):618-624. doi: 10.1089/cmb.2018.0170. Epub 2018 Dec 5.

MSClust: A Multi-Seeds based Clustering algorithm for microbiome profiling using 16S rRNA sequence.

J Microbiol Methods. 2013 Sep;94(3):347-55. doi: 10.1016/j.mimet.2013.07.004. Epub 2013 Jul 28.

bioOTU: An Improved Method for Simultaneous Taxonomic Assignments and Operational Taxonomic Units Clustering of 16s rRNA Gene Sequences.

J Comput Biol. 2016 Apr;23(4):229-38. doi: 10.1089/cmb.2015.0214. Epub 2016 Mar 7.

Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis.

Appl Environ Microbiol. 2011 May;77(10):3219-26. doi: 10.1128/AEM.02810-10. Epub 2011 Mar 18.

Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering.

Microbiome. 2015 Oct 5;3:43. doi: 10.1186/s40168-015-0105-6.

引用本文的文献

A toolbox of machine learning software to support microbiome analysis.

Front Microbiol. 2023 Nov 22;14:1250806. doi: 10.3389/fmicb.2023.1250806. eCollection 2023.

invMap: a sensitive mapping tool for long noisy reads with inversion structural variants.

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad726.

Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences.

Front Microbiol. 2021 Mar 24;12:644012. doi: 10.3389/fmicb.2021.644012. eCollection 2021.

smsMap: mapping single molecule sequencing reads by locating the alignment starting positions.

BMC Bioinformatics. 2020 Aug 4;21(1):341. doi: 10.1186/s12859-020-03698-w.

DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs.

Front Microbiol. 2019 Mar 12;10:428. doi: 10.3389/fmicb.2019.00428. eCollection 2019.

De novo clustering of long reads by gene from transcriptomics data.

Nucleic Acids Res. 2019 Jan 10;47(1):e2. doi: 10.1093/nar/gky834.

NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model.

BMC Bioinformatics. 2018 May 22;19(1):177. doi: 10.1186/s12859-018-2208-0.

The gut microbiota and immune checkpoint inhibitors.

Hum Vaccin Immunother. 2018;14(9):2178-2182. doi: 10.1080/21645515.2018.1442970. Epub 2018 Apr 9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

DMclust，一种基于密度的 OTU 聚类方法，用于准确提取 16S rRNA 序列。

DMclust, a Density-based Modularity Method for Accurate OTU Picking of 16S rRNA Sequences.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译