Suppr超能文献

利用CRISPR-Cas9基因编辑中的新型sgRNA-DNA序列编码进行准确的深度学习脱靶预测。

Accurate deep learning off-target prediction with novel sgRNA-DNA sequence encoding in CRISPR-Cas9 gene editing.

作者信息

Charlier Jeremy, Nadon Robert, Makarenkov Vladimir

机构信息

Département d'Informatique, Université du Québec à Montréal, Montréal, QC H2X 3Y7, Canada.

McGill University and Genome Quebec Innovation Centre, Montreal, QC H3A 0C7, Canada.

出版信息

Bioinformatics. 2021 Aug 25;37(16):2299-2307. doi: 10.1093/bioinformatics/btab112.

Abstract

MOTIVATION

Off-target predictions are crucial in gene editing research. Recently, significant progress has been made in the field of prediction of off-target mutations, particularly with CRISPR-Cas9 data, thanks to the use of deep learning. CRISPR-Cas9 is a gene editing technique which allows manipulation of DNA fragments. The sgRNA-DNA (single guide RNA-DNA) sequence encoding for deep neural networks, however, has a strong impact on the prediction accuracy. We propose a novel encoding of sgRNA-DNA sequences that aggregates sequence data with no loss of information.

RESULTS

In our experiments, we compare the proposed sgRNA-DNA sequence encoding applied in a deep learning prediction framework with state-of-the-art encoding and prediction methods. We demonstrate the superior accuracy of our approach in a simulation study involving Feedforward Neural Networks (FNNs), Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) as well as the traditional Random Forest (RF), Naive Bayes (NB) and Logistic Regression (LR) classifiers. We highlight the quality of our results by building several FNNs, CNNs and RNNs with various layer depths and performing predictions on two popular gene editing datasets (CRISPOR and GUIDE-seq). In all our experiments, the new encoding led to more accurate off-target prediction results, providing an improvement of the area under the Receiver Operating Characteristic (ROC) curve up to 35%.

AVAILABILITY AND IMPLEMENTATION

The code and data used in this study are available at: https://github.com/dagrate/dl-offtarget.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

脱靶预测在基因编辑研究中至关重要。近年来,得益于深度学习的应用,脱靶突变预测领域取得了显著进展,尤其是在CRISPR-Cas9数据方面。CRISPR-Cas9是一种允许对DNA片段进行操作的基因编辑技术。然而,用于深度神经网络编码的sgRNA-DNA(单导向RNA-DNA)序列对预测准确性有很大影响。我们提出了一种新颖的sgRNA-DNA序列编码方法,该方法在不损失信息的情况下聚合序列数据。

结果

在我们的实验中,我们将应用于深度学习预测框架的sgRNA-DNA序列编码方法与最先进的编码和预测方法进行了比较。我们在涉及前馈神经网络(FNN)、卷积神经网络(CNN)和循环神经网络(RNN)以及传统随机森林(RF)、朴素贝叶斯(NB)和逻辑回归(LR)分类器的模拟研究中证明了我们方法的卓越准确性。我们通过构建具有不同层深度的多个FNN、CNN和RNN,并在两个流行的基因编辑数据集(CRISPOR和GUIDE-seq)上进行预测,突出了我们结果的质量。在我们所有的实验中,新编码都带来了更准确的脱靶预测结果,使受试者工作特征(ROC)曲线下面积提高了35%。

可用性和实现方式

本研究中使用的代码和数据可在以下网址获取:https://github.com/dagrate/dl-offtarget。

补充信息

补充数据可在《生物信息学》在线版获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验