Edgar Robert C
BMC Bioinformatics. 2007 Jan 20;8:18. doi: 10.1186/1471-2105-8-18.
Sequencing of prokaryotic genomes has recently revealed the presence of CRISPR elements: short, highly conserved repeats separated by unique sequences of similar length. The distinctive sequence signature of CRISPR repeats can be found using general-purpose repeat- or pattern-finding software tools. However, the output of such tools is not always ideal for studying these repeats, and significant effort is sometimes needed to build additional tools and perform manual analysis of the output.
We present PILER-CR, a program specifically designed for the identification and analysis of CRISPR repeats. The program executes rapidly, completing a 5 Mb genome in around 5 seconds on a current desktop computer. We validate the algorithm by manual curation and by comparison with published surveys of these repeats, finding that PILER-CR has both high sensitivity and high specificity. We also present a catalogue of putative CRISPR repeats identified in a comprehensive analysis of 346 prokaryotic genomes.
PILER-CR is a useful tool for rapid identification and classification of CRISPR repeats. The software is donated to the public domain. Source code and a Linux binary are freely available at http://www.drive5.com/pilercr.
原核生物基因组测序最近揭示了CRISPR元件的存在:由相似长度的独特序列分隔的短的、高度保守的重复序列。可以使用通用的重复序列或模式查找软件工具来找到CRISPR重复序列的独特序列特征。然而,这些工具的输出对于研究这些重复序列并不总是理想的,有时需要付出巨大努力来构建额外的工具并对输出进行人工分析。
我们展示了PILER-CR,这是一个专门设计用于识别和分析CRISPR重复序列的程序。该程序执行速度很快,在当前的台式计算机上大约5秒钟就能完成一个5兆碱基的基因组分析。我们通过人工筛选以及与已发表的这些重复序列的调查结果进行比较来验证该算法,发现PILER-CR具有高灵敏度和高特异性。我们还展示了在对346个原核生物基因组进行全面分析时识别出的假定CRISPR重复序列的目录。
PILER-CR是用于快速识别和分类CRISPR重复序列的有用工具。该软件已捐赠至公共领域。源代码和Linux二进制文件可从http://www.drive5.com/pilercr免费获取。