Rahman Md Khaledur, Rahman M Sohel
AℓEDA group, Dept. of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh.
Dept. of Computer Science and Engineering, United International University, Dhaka, Bangladesh.
PLoS One. 2017 Aug 2;12(8):e0181943. doi: 10.1371/journal.pone.0181943. eCollection 2017.
The CRISPR/Cas9-sgRNA system has recently become a popular tool for genome editing and a very hot topic in the field of medical research. In this system, Cas9 protein is directed to a desired location for gene engineering and cleaves target DNA sequence which is complementary to a 20-nucleotide guide sequence found within the sgRNA. A lot of experimental efforts, ranging from in vivo selection to in silico modeling, have been made for efficient designing of sgRNAs in CRISPR/Cas9 system. In this article, we present a novel tool, called CRISPRpred, for efficient in silico prediction of sgRNAs on-target activity which is based on the applications of Support Vector Machine (SVM) model. To conduct experiments, we have used a benchmark dataset of 17 genes and 5310 guide sequences where there are only 20% true values. CRISPRpred achieves Area Under Receiver Operating Characteristics Curve (AUROC-Curve), Area Under Precision Recall Curve (AUPR-Curve) and maximum Matthews Correlation Coefficient (MCC) as 0.85, 0.56 and 0.48, respectively. Our tool shows approximately 5% improvement in AUPR-Curve and after analyzing all evaluation metrics, we find that CRISPRpred is better than the current state-of-the-art. CRISPRpred is enough flexible to extract relevant features and use them in a learning algorithm. The source code of our entire software with relevant dataset can be found in the following link: https://github.com/khaled-buet/CRISPRpred.
CRISPR/Cas9-sgRNA系统最近已成为基因组编辑的常用工具,也是医学研究领域的一个热门话题。在该系统中,Cas9蛋白被引导至基因工程所需的位置,并切割与sgRNA中发现的20个核苷酸引导序列互补的目标DNA序列。为了在CRISPR/Cas9系统中高效设计sgRNA,已经进行了许多实验工作,从体内筛选到计算机模拟。在本文中,我们提出了一种名为CRISPRpred的新工具,用于基于支持向量机(SVM)模型的应用对sgRNA的靶向活性进行高效的计算机预测。为了进行实验,我们使用了一个包含17个基因和5310个引导序列的基准数据集,其中只有20%是真实值。CRISPRpred的受试者工作特征曲线下面积(AUROC曲线)、精确召回率曲线下面积(AUPR曲线)和最大马修斯相关系数(MCC)分别为0.85、0.56和0.48。我们的工具在AUPR曲线上显示出约5%的改进,并且在分析所有评估指标后,我们发现CRISPRpred优于当前的最先进技术。CRISPRpred足够灵活,可以提取相关特征并将其用于学习算法。我们整个软件的源代码及相关数据集可在以下链接中找到:https://github.com/khaled-buet/CRISPRpred。