Suppr超能文献

CLUSTOM:一种通过最小化重叠来聚类 16S rRNA 下一代序列的新方法。

CLUSTOM: a novel method for clustering 16S rRNA next generation sequences by overlap minimization.

机构信息

Biological Resource Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea.

出版信息

PLoS One. 2013 May 1;8(5):e62623. doi: 10.1371/journal.pone.0062623. Print 2013.

Abstract

The recent nucleic acid sequencing revolution driven by shotgun and high-throughput technologies has led to a rapid increase in the number of sequences for microbial communities. The availability of 16S ribosomal RNA (rRNA) gene sequences from a multitude of natural environments now offers a unique opportunity to study microbial diversity and community structure. The large volume of sequencing data however makes it time consuming to assign individual sequences to phylotypes by searching them against public databases. Since ribosomal sequences have diverged across prokaryotic species, they can be grouped into clusters that represent operational taxonomic units. However, available clustering programs suffer from overlap of sequence spaces in adjacent clusters. In natural environments, gene sequences are homogenous within species but divergent between species. This evolutionary constraint results in an uneven distribution of genetic distances of genes in sequence space. To cluster 16S rRNA sequences more accurately, it is therefore essential to select core sequences that are located at the centers of the distributions represented by the genetic distance of sequences in taxonomic units. Based on this idea, we here describe a novel sequence clustering algorithm named CLUSTOM that minimizes the overlaps between adjacent clusters. The performance of this algorithm was evaluated in a comparative exercise with existing programs, using the reference sequences of the SILVA database as well as published pyrosequencing datasets. The test revealed that our algorithm achieves higher accuracy than ESPRIT-Tree and mothur, few of the best clustering algorithms. Results indicate that the concept of an uneven distribution of sequence distances can effectively and successfully cluster 16S rRNA gene sequences. The algorithm of CLUSTOM has been implemented both as a web and as a standalone command line application, which are available at http://clustom.kribb.re.kr.

摘要

近年来,高通量和鸟枪法测序技术的发展推动了核酸测序的革命,使得微生物群落的序列数量迅速增加。大量来自自然环境的 16S 核糖体 RNA(rRNA)基因序列的出现,为研究微生物多样性和群落结构提供了独特的机会。然而,大量的测序数据使得通过在公共数据库中搜索来将单个序列分配给分类群变得耗时。由于核糖体序列在原核生物物种中已经发生了分歧,因此可以将它们分成代表操作分类单位的聚类。然而,现有的聚类程序存在相邻聚类之间序列空间重叠的问题。在自然环境中,基因序列在物种内是同质的,但在物种之间是不同的。这种进化约束导致序列空间中基因的遗传距离分布不均匀。为了更准确地聚类 16S rRNA 序列,因此必须选择位于分类单位序列的遗传距离所代表的分布中心的核心序列。基于这个想法,我们在这里描述了一种新的序列聚类算法,名为 CLUSTOM,它可以最大限度地减少相邻聚类之间的重叠。我们使用 SILVA 数据库的参考序列以及已发表的焦磷酸测序数据集,通过与现有程序的比较评估了该算法的性能。测试表明,我们的算法比 ESPRIT-Tree 和 mothur 等最佳聚类算法具有更高的准确性。结果表明,序列距离不均匀分布的概念可以有效地成功聚类 16S rRNA 基因序列。CLUSTOM 的算法已经实现了作为一个网络和一个独立的命令行应用程序,可在 http://clustom.kribb.re.kr 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1708/3641076/32fd30e444c9/pone.0062623.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验