Suppr超能文献

DMclust,一种基于密度的 OTU 聚类方法,用于准确提取 16S rRNA 序列。

DMclust, a Density-based Modularity Method for Accurate OTU Picking of 16S rRNA Sequences.

机构信息

Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China.

出版信息

Mol Inform. 2017 Dec;36(12). doi: 10.1002/minf.201600059. Epub 2017 Jun 6.

Abstract

Clustering 16S rRNA sequences into operational taxonomic units (OTUs) is a crucial step in analyzing metagenomic data. Although many methods have been developed, how to obtain an appropriate balance between clustering accuracy and computational efficiency is still a major challenge. A novel density-based modularity clustering method, called DMclust, is proposed in this paper to bin 16S rRNA sequences into OTUs with high clustering accuracy. The DMclust algorithm consists of four main phases. It first searches for the sequence dense group defined as n-sequence community, in which the distance between any two sequences is less than a threshold. Then these dense groups are used to construct a weighted network, where dense groups are viewed as nodes, each pair of dense groups is connected by an edge, and the distance of pairwise groups represents the weight of the edge. Then, a modularity-based community detection method is employed to generate the preclusters. Finally, the remaining sequences are assigned to their nearest preclusters to form OTUs. Compared with existing widely used methods, the experimental results on several metagenomic datasets show that DMclust has higher accurate clustering performance with acceptable memory usage.

摘要

将 16S rRNA 序列聚类为操作分类单元 (OTUs) 是分析宏基因组数据的关键步骤。尽管已经开发了许多方法,但如何在聚类准确性和计算效率之间取得适当的平衡仍然是一个主要挑战。本文提出了一种新的基于密度的模块聚类方法 DMclust,用于将 16S rRNA 序列聚类为具有高聚类准确性的 OTUs。DMclust 算法由四个主要阶段组成。它首先搜索定义为 n-序列社区的序列密集组,其中任意两个序列之间的距离小于阈值。然后,这些密集组用于构建加权网络,其中密集组被视为节点,每个密集组对之间通过边连接,并且对组的距离表示边的权重。然后,使用基于模块性的社区检测方法生成预聚类。最后,将剩余的序列分配给它们最近的预聚类以形成 OTUs。与现有的广泛使用的方法相比,在几个宏基因组数据集上的实验结果表明,DMclust 具有更高的准确聚类性能和可接受的内存使用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验