College of Engineering, Shantou University, Shantou 515063, China.
School of Cyber Science and Technology, Sun Yat-sen University, Shenzhen 518107, China.
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad333.
In silico design of single guide RNA (sgRNA) plays a critical role in clustered regularly interspaced, short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) system. Continuous efforts are aimed at improving sgRNA design with efficient on-target activity and reduced off-target mutations. In the last 5 years, an increasing number of deep learning-based methods have achieved breakthrough performance in predicting sgRNA on- and off-target activities. Nevertheless, it is worthwhile to systematically evaluate these methods for their predictive abilities. In this review, we conducted a systematic survey on the progress in prediction of on- and off-target editing. We investigated the performances of 10 mainstream deep learning-based on-target predictors using nine public datasets with different sample sizes. We found that in most scenarios, these methods showed superior predictive power on large- and medium-scale datasets than on small-scale datasets. In addition, we performed unbiased experiments to provide in-depth comparison of eight representative approaches for off-target prediction on 12 publicly available datasets with various imbalanced ratios of positive/negative samples. Most methods showed excellent performance on balanced datasets but have much room for improvement on moderate- and severe-imbalanced datasets. This study provides comprehensive perspectives on CRISPR/Cas9 sgRNA on- and off-target activity prediction and improvement for method development.
在 CRISPR/Cas9 系统中,单指导 RNA(sgRNA)的计算机设计起着至关重要的作用。人们一直在努力改进 sgRNA 的设计,以提高其靶活性并减少脱靶突变。在过去的 5 年中,越来越多的基于深度学习的方法在预测 sgRNA 的靶活性和脱靶活性方面取得了突破性的表现。然而,对这些方法的预测能力进行系统评估是值得的。在本综述中,我们对基于深度学习的 sgRNA 靶和脱靶编辑预测的进展进行了系统的调查。我们使用了 9 个不同样本大小的公共数据集来评估 10 种主流的基于深度学习的靶预测器的性能。我们发现,在大多数情况下,这些方法在大、中规模数据集上的预测能力优于小规模数据集。此外,我们进行了无偏实验,在 12 个具有不同正负样本不平衡比的公开数据集上对 8 种有代表性的脱靶预测方法进行了深入比较。大多数方法在平衡数据集上表现出色,但在中等和严重不平衡数据集上仍有很大的改进空间。本研究为 CRISPR/Cas9 sgRNA 的靶和脱靶活性预测提供了全面的视角,并为方法开发提供了改进的思路。