Sun Yijun, Cai Yunpeng, Liu Li, Yu Fahong, Farrell Michael L, McKendree William, Farmerie William
Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL 32610-3622, USA.
Nucleic Acids Res. 2009 Jun;37(10):e76. doi: 10.1093/nar/gkp285. Epub 2009 May 5.
Recent metagenomics studies of environmental samples suggested that microbial communities are much more diverse than previously reported, and deep sequencing will significantly increase the estimate of total species diversity. Massively parallel pyrosequencing technology enables ultra-deep sequencing of complex microbial populations rapidly and inexpensively. However, computational methods for analyzing large collections of 16S ribosomal sequences are limited. We proposed a new algorithm, referred to as ESPRIT, which addresses several computational issues with prior methods. We developed two versions of ESPRIT, one for personal computers (PCs) and one for computer clusters (CCs). The PC version is used for small- and medium-scale data sets and can process several tens of thousands of sequences within a few minutes, while the CC version is for large-scale problems and is able to analyze several hundreds of thousands of reads within one day. Large-scale experiments are presented that clearly demonstrate the effectiveness of the newly proposed algorithm. The source code and user guide are freely available at http://www.biotech.ufl.edu/people/sun/esprit.html.
近期对环境样本的宏基因组学研究表明,微生物群落的多样性比之前报道的要丰富得多,深度测序将显著提高对物种总多样性的估计。大规模平行焦磷酸测序技术能够快速且低成本地对复杂微生物群体进行超深度测序。然而,用于分析大量16S核糖体序列的计算方法有限。我们提出了一种新算法,称为ESPRIT,它解决了先前方法中的几个计算问题。我们开发了两个版本的ESPRIT,一个用于个人电脑(PC),另一个用于计算机集群(CC)。PC版本用于中小型数据集,能在几分钟内处理数万条序列,而CC版本用于大规模问题,能够在一天内分析数十万条读数。文中展示了大规模实验,清楚地证明了新提出算法的有效性。源代码和用户指南可在http://www.biotech.ufl.edu/people/sun/esprit.html免费获取。