Li Bin, Chen Poshen B, Diao Yarui
Department of Cellular and Molecular Medicine, University of California, San Diego School of Medicine, La Jolla, CA 92093, USA.
Department of Cell Biology, Department of Orthopaedic Surgery, and Regeneration Next Initiative, Duke University Medical Center, Durham, NC 27710, USA.
NAR Genom Bioinform. 2021 Feb 23;3(1):lqab013. doi: 10.1093/nargab/lqab013. eCollection 2021 Mar.
CRISPR is a revolutionary genome-editing tool that has been broadly used and integrated within novel biotechnologies. A major component of existing CRISPR design tools is the search engines that find the off-targets up to a predefined number of mismatches. Many CRISPR design tools adapted sequence alignment tools as the search engines to speed up the process. These commonly used alignment tools include BLAST, BLAT, Bowtie, Bowtie2 and BWA. Alignment tools use heuristic algorithm to align large amount of sequences with high performance. However, due to the seed-and-extend algorithms implemented in the sequence alignment tools, these methods are likely to provide incomplete off-targets information for ultra-short sequences, such as 20-bp guide RNAs (gRNA). An incomplete list of off-targets sites may lead to erroneous CRISPR design. To address this problem, we derived four sets of gRNAs to evaluate the accuracy of existing search engines; further, we introduce a search engine, namely CRISPR-SE. CRISPR-SE is an accurate and fast search engine using a brute force approach. In CRISPR-SE, all gRNAs are virtually compared with query gRNA, therefore, the accuracies are guaranteed. We performed the accuracy benchmark with multiple search engines. The results show that as expected, alignment tools reported an incomplete and varied list of off-target sites. CRISPR-SE performs well in both accuracy and speed. CRISPR-SE will improve the quality of CRISPR design as an accurate high-performance search engine.
CRISPR是一种革命性的基因组编辑工具,已被广泛应用并整合到新型生物技术中。现有CRISPR设计工具的一个主要组成部分是搜索引擎,它能找到多达预定义错配数目的脱靶位点。许多CRISPR设计工具采用序列比对工具作为搜索引擎以加快搜索过程。这些常用的比对工具包括BLAST、BLAT、Bowtie、Bowtie2和BWA。比对工具使用启发式算法来高效比对大量序列。然而,由于序列比对工具中实现的种子延伸算法,这些方法可能会为超短序列(如20个碱基对的引导RNA(gRNA))提供不完整的脱靶信息。脱靶位点列表不完整可能会导致错误的CRISPR设计。为了解决这个问题,我们推导了四组gRNA来评估现有搜索引擎的准确性;此外,我们引入了一个搜索引擎,即CRISPR-SE。CRISPR-SE是一个使用暴力方法的准确且快速的搜索引擎。在CRISPR-SE中,所有gRNA都与查询gRNA进行虚拟比较,因此,准确性得到保证。我们使用多个搜索引擎进行了准确性基准测试。结果表明,正如预期的那样,比对工具报告的脱靶位点列表不完整且各不相同。CRISPR-SE在准确性和速度方面都表现出色。作为一个准确的高性能搜索引擎,CRISPR-SE将提高CRISPR设计的质量。