Vinodkumar Prasoon Kumar, Ozcinar Cagri, Anbarjafari Gholamreza
iCV Lab, Institute of Technology, University of Tartu, 51009 Tartu, Estonia.
PwC Advisory Finland, 00180 Helsinki, Finland.
Entropy (Basel). 2021 May 14;23(5):608. doi: 10.3390/e23050608.
CRISPR/Cas9 is a powerful genome-editing technology that has been widely applied in targeted gene repair and gene expression regulation. One of the main challenges for the CRISPR/Cas9 system is the occurrence of unexpected cleavage at some sites (off-targets) and predicting them is necessary due to its relevance in gene editing research. Very few deep learning models have been developed so far to predict the off-target propensity of single guide RNA (sgRNA) at specific DNA fragments by using artificial feature extract operations and machine learning techniques; however, this is a convoluted process that is difficult to understand and implement for researchers. In this research work, we introduce a novel graph-based approach to predict off-target efficacy of sgRNA in the CRISPR/Cas9 system that is easy to understand and replicate for researchers. This is achieved by creating a graph with sequences as nodes and by using a link prediction method to predict the presence of links between sgRNA and off-target inducing target DNA sequences. Features for the sequences are extracted from within the sequences. We used HEK293 and K562 t datasets in our experiments. GCN predicted the off-target gene knockouts (using link prediction) by predicting the links between sgRNA and off-target sequences with an auROC value of 0.987.
CRISPR/Cas9是一种强大的基因组编辑技术,已广泛应用于靶向基因修复和基因表达调控。CRISPR/Cas9系统的主要挑战之一是在某些位点(脱靶位点)出现意外切割,由于其与基因编辑研究的相关性,对其进行预测很有必要。到目前为止,通过使用人工特征提取操作和机器学习技术来预测特定DNA片段上单导向RNA(sgRNA)的脱靶倾向的深度学习模型非常少;然而,这是一个复杂的过程,研究人员很难理解和实施。在这项研究工作中,我们引入了一种新颖的基于图的方法来预测CRISPR/Cas9系统中sgRNA的脱靶效率,研究人员很容易理解和复制该方法。这是通过创建一个以序列为节点的图,并使用链接预测方法来预测sgRNA与脱靶诱导靶DNA序列之间链接的存在来实现的。序列的特征是从序列内部提取的。我们在实验中使用了HEK293和K562数据集。GCN通过预测sgRNA与脱靶序列之间的链接来预测脱靶基因敲除(使用链接预测),auROC值为0.987。