Cao Mingming, Brennan Alexander, Lee Ciaran M, Park So-Hyun, Bao Gang
Department of Bioengineering, Rice University, Houston, TX, 77030, USA.
School of Biochemistry and Cell Biology, University College Cork, Cork, T12 K8AF, Ireland.
Small Methods. 2025 Jul;9(7):e2500122. doi: 10.1002/smtd.202500122. Epub 2025 Jun 4.
CRISPR/Cas genome editing technologies enable effective and controlled genetic modifications; however, off-target effects remain a significant concern, particularly in clinical applications. Experimental and in silico methods are developed to predict potential off-target sites (OTS), including deep learning based methods, which can automatically and comprehensively learn sequence features, offer a promising tool for OTS prediction. Here, this work reviews the existing OTS prediction tools with an emphasis on deep learning methods, characterizes datasets used for deep learning training and testing, and evaluates six deep learning models -CRISPR-Net, CRISPR-IP, R-CRISPR, CRISPR-M, CrisprDNT, and Crispr-SGRU -using six public datasets and validates OTS data from the CRISPRoffT database. Performance of these models is assessed using standardized metrics, such as Precision, Recall, F1 score, MCC, AUROC and PRAUC. This work finds that incorporating validated OTS datasets into model training enhanced overall model performance, and improved robustness of prediction, particularly with highly imbalanced datasets. While no model consistently outperforms other models across all scenarios, CRISPR-Net, R-CRISPR, and Crispr-SGRU show strong overall performance. This analysis demonstrates the importance of integrating high-quality validated OTS data with advanced deep learning architectures to improve CRISPR/Cas off-target site predictions, ensuring safer genome editing applications.
CRISPR/Cas基因组编辑技术能够实现有效且可控的基因修饰;然而,脱靶效应仍然是一个重大问题,尤其是在临床应用中。人们开发了实验方法和计算机模拟方法来预测潜在的脱靶位点(OTS),包括基于深度学习的方法,这些方法可以自动且全面地学习序列特征,为OTS预测提供了一个有前景的工具。在此,本文回顾了现有的OTS预测工具,重点介绍深度学习方法,对用于深度学习训练和测试的数据集进行了特征描述,并使用六个公共数据集评估了六个深度学习模型——CRISPR-Net、CRISPR-IP、R-CRISPR、CRISPR-M、CrisprDNT和Crispr-SGRU,并验证了来自CRISPRoffT数据库的OTS数据。使用标准化指标(如精确率、召回率、F1分数、马修斯相关系数、曲线下面积和精确召回曲线下面积)评估这些模型的性能。本文发现,将经过验证的OTS数据集纳入模型训练可提高整体模型性能,并增强预测的稳健性,尤其是对于高度不平衡的数据集。虽然没有一个模型在所有情况下都始终优于其他模型,但CRISPR-Net、R-CRISPR和Crispr-SGRU表现出强大的整体性能。该分析表明,将高质量的经过验证的OTS数据与先进的深度学习架构相结合对于改善CRISPR/Cas脱靶位点预测、确保更安全的基因组编辑应用至关重要。