Zhu Lihua J, Holmes Benjamin R, Aronin Neil, Brodsky Michael H
Program in Gene Function and Expression and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA, United States of America; Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, United States of America.
Broad Institute of MIT and Harvard, McGovern Institute for Brain Research at MIT, Departments of Brain and Cognitive Sciences and Biological Engineering, MIT, Cambridge, MA, United States of America.
PLoS One. 2014 Sep 23;9(9):e108424. doi: 10.1371/journal.pone.0108424. eCollection 2014.
CRISPR-Cas systems are a diverse family of RNA-protein complexes in bacteria that target foreign DNA sequences for cleavage. Derivatives of these complexes have been engineered to cleave specific target sequences depending on the sequence of a CRISPR-derived guide RNA (gRNA) and the source of the Cas9 protein. Important considerations for the design of gRNAs are to maximize aimed activity at the desired target site while minimizing off-target cleavage. Because of the rapid advances in the understanding of existing CRISPR-Cas9-derived RNA-guided nucleases and the development of novel RNA-guided nuclease systems, it is critical to have computational tools that can accommodate a wide range of different parameters for the design of target-specific RNA-guided nuclease systems. We have developed CRISPRseek, a highly flexible, open source software package to identify gRNAs that target a given input sequence while minimizing off-target cleavage at other sites within any selected genome. CRISPRseek will identify potential gRNAs that target a sequence of interest for CRISPR-Cas9 systems from different bacterial species and generate a cleavage score for potential off-target sequences utilizing published or user-supplied weight matrices with position-specific mismatch penalty scores. Identified gRNAs may be further filtered to only include those that occur in paired orientations for increased specificity and/or those that overlap restriction enzyme sites. For applications where gRNAs are desired to discriminate between two related sequences, CRISPRseek can rank gRNAs based on the difference between predicted cleavage scores in each input sequence. CRISPRseek is implemented as a Bioconductor package within the R statistical programming environment, allowing it to be incorporated into computational pipelines to automate the design of gRNAs for target sequences identified in a wide variety of genome-wide analyses. CRISPRseek is available under the GNU General Public Licence v3.0 at http://www.bioconductor.org.
CRISPR-Cas系统是细菌中一类多样的RNA-蛋白质复合物,可靶向切割外源DNA序列。这些复合物的衍生物经过工程改造,可根据CRISPR衍生的引导RNA(gRNA)序列和Cas9蛋白来源切割特定的靶序列。设计gRNA时的重要考虑因素是在期望的靶位点最大化靶向活性,同时最小化脱靶切割。由于在理解现有CRISPR-Cas9衍生的RNA引导核酸酶方面的快速进展以及新型RNA引导核酸酶系统的开发,拥有能够适应广泛不同参数以设计靶标特异性RNA引导核酸酶系统的计算工具至关重要。我们开发了CRISPRseek,这是一个高度灵活的开源软件包,用于识别靶向给定输入序列的gRNA,同时最小化在任何选定基因组内其他位点的脱靶切割。CRISPRseek将从不同细菌物种中识别针对CRISPR-Cas9系统感兴趣序列的潜在gRNA,并利用已发表或用户提供的具有位置特异性错配惩罚分数的权重矩阵为潜在脱靶序列生成切割分数。识别出的gRNA可进一步筛选,仅包括那些以配对方向出现以提高特异性的gRNA和/或那些与限制酶位点重叠的gRNA。对于需要gRNA区分两个相关序列的应用,CRISPRseek可根据每个输入序列中预测切割分数的差异对gRNA进行排名。CRISPRseek作为R统计编程环境中的一个Bioconductor包实现,使其能够纳入计算流程,以自动设计在各种全基因组分析中识别的靶序列的gRNA。CRISPRseek可在GNU通用公共许可证v3.0下从http://www.bioconductor.org获得。