Zhang Zhong-Rui, Jiang Zhen-Ran
School of Computer Science and Technology, East China Normal University, Shanghai 200062, China.
Comput Struct Biotechnol J. 2022 Jan 19;20:650-661. doi: 10.1016/j.csbj.2022.01.006. eCollection 2022.
The CRISPR/Cas9 gene-editing system is the third-generation gene-editing technology that has been widely used in biomedical applications. However, off-target effects occurring CRISPR/Cas9 system has been a challenging problem it faces in practical applications. Although many predictive models have been developed to predict off-target activities, current models do not effectively use sequence pair information. There is still room for improved accuracy. This study aims to effectively use sequence pair information to improve the model's performance for predicting off-target activities. We propose a new coding scheme for coding sequence pairs and design a new model called CRISPR-IP for predicting off-target activity. Our coding scheme distinguishes regions with different functions in the sequence pairs through the function channel. Moreover, it distinguishes between bases and base pairs using type channels, effectively representing the sequence pair information. The CRISPR-IP model is based on CNN, BiLSTM, and the attention layer to learn features of sequence pairs. We performed performance verification on two data sets and found that our coding scheme can represent sequence pair information effectively, and the CRISPR-IP model performance is better than others. Data and source codes are available at https://github.com/BioinfoVirgo/CRISPR-IP.
CRISPR/Cas9基因编辑系统是第三代基因编辑技术,已广泛应用于生物医学领域。然而,CRISPR/Cas9系统出现的脱靶效应一直是其在实际应用中面临的一个具有挑战性的问题。尽管已经开发了许多预测模型来预测脱靶活性,但目前的模型并未有效利用序列对信息。在预测准确性方面仍有提升空间。本研究旨在有效利用序列对信息来提高预测脱靶活性模型的性能。我们提出了一种用于编码序列对的新编码方案,并设计了一种名为CRISPR-IP的新模型来预测脱靶活性。我们的编码方案通过功能通道区分序列对中具有不同功能的区域。此外,它使用类型通道区分碱基和碱基对,有效地表示了序列对信息。CRISPR-IP模型基于卷积神经网络(CNN)、双向长短期记忆网络(BiLSTM)和注意力层来学习序列对的特征。我们在两个数据集上进行了性能验证,发现我们的编码方案能够有效表示序列对信息,并且CRISPR-IP模型的性能优于其他模型。数据和源代码可在https://github.com/BioinfoVirgo/CRISPR-IP获取。