Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR.
Bioinformatics. 2018 Sep 1;34(17):i656-i663. doi: 10.1093/bioinformatics/bty554.
The prediction of off-target mutations in CRISPR-Cas9 is a hot topic due to its relevance to gene editing research. Existing prediction methods have been developed; however, most of them just calculated scores based on mismatches to the guide sequence in CRISPR-Cas9. Therefore, the existing prediction methods are unable to scale and improve their performance with the rapid expansion of experimental data in CRISPR-Cas9. Moreover, the existing methods still cannot satisfy enough precision in off-target predictions for gene editing at the clinical level.
To address it, we design and implement two algorithms using deep neural networks to predict off-target mutations in CRISPR-Cas9 gene editing (i.e. deep convolutional neural network and deep feedforward neural network). The models were trained and tested on the recently released off-target dataset, CRISPOR dataset, for performance benchmark. Another off-target dataset identified by GUIDE-seq was adopted for additional evaluation. We demonstrate that convolutional neural network achieves the best performance on CRISPOR dataset, yielding an average classification area under the ROC curve (AUC) of 97.2% under stratified 5-fold cross-validation. Interestingly, the deep feedforward neural network can also be competitive at the average AUC of 97.0% under the same setting. We compare the two deep neural network models with the state-of-the-art off-target prediction methods (i.e. CFD, MIT, CROP-IT, and CCTop) and three traditional machine learning models (i.e. random forest, gradient boosting trees, and logistic regression) on both datasets in terms of AUC values, demonstrating the competitive edges of the proposed algorithms. Additional analyses are conducted to investigate the underlying reasons from different perspectives.
The example code are available at https://github.com/MichaelLinn/off_target_prediction. The related datasets are available at https://github.com/MichaelLinn/off_target_prediction/tree/master/data.
由于与基因编辑研究相关,CRISPR-Cas9 脱靶突变的预测是一个热门话题。已经开发出了现有的预测方法;然而,大多数方法只是根据 CRISPR-Cas9 中的引导序列的不匹配来计算分数。因此,现有的预测方法无法随着 CRISPR-Cas9 中实验数据的快速扩展而扩展和提高其性能。此外,现有的方法仍然不能满足临床水平基因编辑中脱靶预测的足够精度。
为了解决这个问题,我们使用深度神经网络设计并实现了两种算法来预测 CRISPR-Cas9 基因编辑中的脱靶突变(即卷积神经网络和深度前馈神经网络)。模型在最近发布的脱靶数据集 CRISPOR 数据集上进行了训练和测试,以进行性能基准测试。另一个由 GUIDE-seq 确定的脱靶数据集被用于额外的评估。我们证明,卷积神经网络在 CRISPOR 数据集上取得了最佳性能,在分层 5 倍交叉验证下平均 ROC 曲线下的分类面积(AUC)为 97.2%。有趣的是,在相同设置下,深度前馈神经网络也可以具有竞争力,平均 AUC 为 97.0%。我们在两个数据集上,根据 AUC 值将这两种深度神经网络模型与最先进的脱靶预测方法(即 CFD、MIT、CROP-IT 和 CCTop)以及三种传统机器学习模型(即随机森林、梯度提升树和逻辑回归)进行比较,展示了所提出算法的竞争优势。进行了额外的分析,从不同角度探讨了潜在的原因。
示例代码可在 https://github.com/MichaelLinn/off_target_prediction 上获得。相关数据集可在 https://github.com/MichaelLinn/off_target_prediction/tree/master/data 上获得。