School of Computer Science and Technology, East China Normal University, Shanghai 200062, China.
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad127.
The off-target effect occurring in the CRISPR-Cas9 system has been a challenging problem for the practical application of this gene editing technology. In recent years, various prediction models have been proposed to predict potential off-target activities. However, most of the existing prediction methods do not fully exploit guide RNA (gRNA) and DNA sequence pair information effectively. In addition, available prediction methods usually ignore the noise effect in original off-target datasets. To address these issues, we design a novel coding scheme, which considers the key features of mismatch type, mismatch location and the gRNA-DNA sequence pair information. Furthermore, a transformer-based anti-noise model called CrisprDNT is developed to solve the noise problem that exists in the off-target data. Experimental results of eight existing datasets demonstrate that the method with the inclusion of the anti-noise loss functions is superior to available state-of-the-art prediction methods. CrisprDNT is available at https://github.com/gzrgzx/CrisprDNT.
CRISPR-Cas9 系统中的脱靶效应一直是该基因编辑技术实际应用的一个挑战性问题。近年来,已经提出了各种预测模型来预测潜在的脱靶活性。然而,现有的大多数预测方法并没有充分有效地利用引导 RNA(gRNA)和 DNA 序列对信息。此外,现有的预测方法通常忽略了原始脱靶数据集中的噪声效应。为了解决这些问题,我们设计了一种新的编码方案,该方案考虑了错配类型、错配位置和 gRNA-DNA 序列对信息的关键特征。此外,还开发了一种基于转换器的抗噪模型,称为 CrisprDNT,以解决脱靶数据中存在的噪声问题。对八个现有数据集的实验结果表明,包含抗噪损失函数的方法优于现有的最先进的预测方法。CrisprDNT 可在 https://github.com/gzrgzx/CrisprDNT 获得。