College of Engineering, Shantou University, Shantou 515063, China.
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China.
Int J Mol Sci. 2024 Oct 11;25(20):10945. doi: 10.3390/ijms252010945.
CRISPR/Cas9 is a popular genome editing technology, yet its clinical application is hindered by off-target effects. Many deep learning-based methods are available for off-target prediction. However, few can predict off-target activities with insertions or deletions (indels) between single guide RNA and DNA sequence pairs. Additionally, the analysis of off-target data is challenged due to a data imbalance issue. Moreover, the prediction accuracy and interpretability remain to be improved. Here, we introduce a deep learning-based framework, named Crispr-SGRU, to predict off-target activities with mismatches and indels. This model is based on Inception and stacked BiGRU. It adopts a dice loss function to solve the inherent imbalance issue. Experimental results show our model outperforms existing methods for off-target prediction in terms of accuracy and robustness. Finally, we study the interpretability of this model through Deep SHAP and teacher-student-based knowledge distillation, and find it can provide meaningful explanations for sequence patterns regarding off-target activity.
CRISPR/Cas9 是一种流行的基因组编辑技术,但由于脱靶效应,其临床应用受到阻碍。许多基于深度学习的方法可用于脱靶预测。然而,很少有方法可以预测单引导 RNA 与 DNA 序列对之间的插入或缺失(indels)的脱靶活性。此外,由于数据不平衡问题,脱靶数据的分析也具有挑战性。此外,预测准确性和可解释性仍有待提高。在这里,我们引入了一种基于深度学习的框架,命名为 Crispr-SGRU,用于预测具有错配和 indels 的脱靶活性。该模型基于 Inception 和堆叠 BiGRU。它采用 Dice 损失函数来解决固有的不平衡问题。实验结果表明,我们的模型在准确性和稳健性方面优于现有的脱靶预测方法。最后,我们通过 Deep SHAP 和基于教师-学生的知识蒸馏研究了该模型的可解释性,发现它可以为脱靶活性的序列模式提供有意义的解释。