Institut für Informatik, Johannes Gutenberg University Mainz, Mainz 55099, Germany.
Bioinformatics. 2012 Aug 15;28(16):2182-3. doi: 10.1093/bioinformatics/bts355. Epub 2012 Jun 23.
Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime.
DySC, implemented in C, is available at http://code.google.com/p/dysc/ under GNU GPL license.
焦磷酸测序技术常用于对 16S 核糖体 RNA 标记基因进行微生物群落分析。产生的读取数据聚类是一项重要但耗时的任务。我们提出了基于动态种子的聚类(DySC),这是一种新的工具,基于贪婪聚类方法,并使用动态种子策略。基于归一化互信息(NMI)标准的评估表明,DySC 在可比的运行时间内产生的聚类比 UCLUST 和 CD-HIT 具有更高的质量。
DySC 是用 C 语言实现的,可在 GNU GPL 许可证下在 http://code.google.com/p/dysc/ 获得。