Center for Life Sciences, Skolkovo Institute of Science and Technology, Moscow 143026, Russia.
Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Institute of Gene Biology, Russian Academy of Sciences, Moscow 119334, Russia.
Nucleic Acids Res. 2022 Jan 25;50(2):e11. doi: 10.1093/nar/gkab1065.
The choice of guide RNA (gRNA) for CRISPR-based gene targeting is an essential step in gene editing applications, but the prediction of gRNA specificity remains challenging. Lack of transparency and focus on point estimates of efficiency disregarding the information on possible error sources in the model limit the power of existing Deep Learning-based methods. To overcome these problems, we present a new approach, a hybrid of Capsule Networks and Gaussian Processes. Our method predicts the cleavage efficiency of a gRNA with a corresponding confidence interval, which allows the user to incorporate information regarding possible model errors into the experimental design. We provide the first utilization of uncertainty estimation in computational gRNA design, which is a critical step toward accurate decision-making for future CRISPR applications. The proposed solution demonstrates acceptable confidence intervals for most test sets and shows regression quality similar to existing models. We introduce a set of criteria for gRNA selection based on off-target cleavage efficiency and its variance and present a collection of pre-computed gRNAs for human chromosome 22. Using Neural Network Interpretation methods, we show that our model rediscovers an established biological factor underlying cleavage efficiency, the importance of the seed region in gRNA.
用于基于 CRISPR 的基因靶向的向导 RNA(gRNA)的选择是基因编辑应用中的一个重要步骤,但 gRNA 特异性的预测仍然具有挑战性。缺乏透明度和对效率的点估计的关注,而忽略了模型中可能的误差源的信息,限制了现有基于深度学习的方法的能力。为了克服这些问题,我们提出了一种新的方法,即胶囊网络和高斯过程的混合。我们的方法预测 gRNA 的切割效率及其相应的置信区间,这允许用户将有关可能的模型误差的信息纳入实验设计中。我们首次在计算 gRNA 设计中利用不确定性估计,这是实现未来 CRISPR 应用中准确决策的关键步骤。所提出的解决方案在大多数测试集中表现出可接受的置信区间,并显示出与现有模型相似的回归质量。我们提出了基于脱靶切割效率及其方差的 gRNA 选择标准,并为人类 22 号染色体提供了一组预先计算的 gRNA。使用神经网络解释方法,我们表明我们的模型重新发现了切割效率的一个已建立的生物学因素,即 gRNA 中种子区域的重要性。