Bland Charles, Ramsey Teresa L, Sabree Fareedah, Lowe Micheal, Brown Kyndall, Kyrpides Nikos C, Hugenholtz Philip
Department of Computer Science, Jackson State University, Jackson, MS 39217, USA.
BMC Bioinformatics. 2007 Jun 18;8:209. doi: 10.1186/1471-2105-8-209.
Clustered Regularly Interspaced Palindromic Repeats (CRISPRs) are a novel type of direct repeat found in a wide range of bacteria and archaea. CRISPRs are beginning to attract attention because of their proposed mechanism; that is, defending their hosts against invading extrachromosomal elements such as viruses. Existing repeat detection tools do a poor job of identifying CRISPRs due to the presence of unique spacer sequences separating the repeats. In this study, a new tool, CRT, is introduced that rapidly and accurately identifies CRISPRs in large DNA strings, such as genomes and metagenomes.
CRT was compared to CRISPR detection tools, Patscan and Pilercr. In terms of correctness, CRT was shown to be very reliable, demonstrating significant improvements over Patscan for measures precision, recall and quality. When compared to Pilercr, CRT showed improved performance for recall and quality. In terms of speed, CRT proved to be a huge improvement over Patscan. Both CRT and Pilercr were comparable in speed, however CRT was faster for genomes containing large numbers of repeats.
In this paper a new tool was introduced for the automatic detection of CRISPR elements. This tool, CRT, showed some important improvements over current techniques for CRISPR identification. CRT's approach to detecting repetitive sequences is straightforward. It uses a simple sequential scan of a DNA sequence and detects repeats directly without any major conversion or preprocessing of the input. This leads to a program that is easy to describe and understand; yet it is very accurate, fast and memory efficient, being O(n) in space and O(nm/l) in time.
成簇规律间隔短回文重复序列(CRISPRs)是在多种细菌和古细菌中发现的一种新型直接重复序列。CRISPRs因其推测的机制开始受到关注,即保护宿主抵御诸如病毒等入侵的染色体外元件。由于存在分隔重复序列的独特间隔序列,现有的重复序列检测工具在识别CRISPRs方面表现不佳。在本研究中,引入了一种新工具CRT,它能快速准确地识别大DNA序列(如基因组和宏基因组)中的CRISPRs。
将CRT与CRISPR检测工具Patscan和Pilercr进行了比较。在正确性方面,CRT被证明非常可靠,在精度、召回率和质量指标上比Patscan有显著改进。与Pilercr相比,CRT在召回率和质量方面表现更佳。在速度方面,CRT被证明比Patscan有巨大改进。CRT和Pilercr在速度上相当,然而对于包含大量重复序列的基因组,CRT更快。
本文介绍了一种用于自动检测CRISPR元件的新工具。该工具CRT在CRISPR识别的现有技术基础上有一些重要改进。CRT检测重复序列的方法很直接。它对DNA序列进行简单的顺序扫描,直接检测重复序列,无需对输入进行任何重大转换或预处理。这使得程序易于描述和理解;而且它非常准确、快速且内存高效,空间复杂度为O(n),时间复杂度为O(nm/l)。