Suppr超能文献

DeepCRISTL:用于在特定细胞环境中预测 CRISPR/Cas9 靶向编辑效率的深度迁移学习。

DeepCRISTL: deep transfer learning to predict CRISPR/Cas9 on-target editing efficiency in specific cellular contexts.

机构信息

School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel.

Department of Computer Science, Bar-Ilan University, Ramat Gan 5290002, Israel.

出版信息

Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae481.

Abstract

MOTIVATION

CRISPR/Cas9 technology has been revolutionizing the field of gene editing. Guide RNAs (gRNAs) enable Cas9 proteins to target specific genomic loci for editing. However, editing efficiency varies between gRNAs and so computational methods were developed to predict editing efficiency for any gRNA of interest. High-throughput datasets of Cas9 editing efficiencies were produced to train machine-learning models to predict editing efficiency. However, these high-throughput datasets have a low correlation with functional and endogenous datasets, which are too small to train accurate machine-learning models on.

RESULTS

We developed DeepCRISTL, a deep-learning model to predict the editing efficiency in a specific cellular context. DeepCRISTL takes advantage of high-throughput datasets to learn general patterns of gRNA editing efficiency and then fine-tunes the model on functional or endogenous data to fit a specific cellular context. We tested two state-of-the-art models trained on high-throughput datasets for editing efficiency prediction, our newly improved DeepHF and CRISPRon, combined with various transfer-learning approaches. The combination of CRISPRon and fine-tuning all model weights was the overall best performer. DeepCRISTL outperformed state-of-the-art methods in predicting editing efficiency in a specific cellular context on functional and endogenous datasets. Using saliency maps, we identified and compared the important features learned by DeepCRISTL across cellular contexts. We believe DeepCRISTL will improve prediction performance in many other CRISPR/Cas9 editing contexts by leveraging transfer learning to utilize both high-throughput datasets and smaller and more biologically relevant datasets.

AVAILABILITY AND IMPLEMENTATION

DeepCRISTL is available via https://github.com/OrensteinLab/DeepCRISTL.

摘要

动机

CRISPR/Cas9 技术正在彻底改变基因编辑领域。向导 RNA(gRNA)使 Cas9 蛋白能够靶向特定的基因组位点进行编辑。然而,gRNA 之间的编辑效率存在差异,因此开发了计算方法来预测任何感兴趣的 gRNA 的编辑效率。产生了高通量的 Cas9 编辑效率数据集来训练机器学习模型以预测编辑效率。然而,这些高通量数据集与功能和内源性数据集相关性较低,这些数据集太小,无法在其上训练准确的机器学习模型。

结果

我们开发了 DeepCRISTL,这是一种深度学习模型,可预测特定细胞环境中的编辑效率。DeepCRISTL 利用高通量数据集来学习 gRNA 编辑效率的一般模式,然后在功能或内源性数据上对模型进行微调,以适应特定的细胞环境。我们测试了两种基于高通量数据集训练的用于编辑效率预测的最先进模型,即我们新改进的 DeepHF 和 CRISPRon,以及各种迁移学习方法。CRISPRon 与调整所有模型权重的组合是整体表现最好的。DeepCRISTL 在功能和内源性数据集上预测特定细胞环境中的编辑效率方面优于最先进的方法。使用显着性图,我们在不同的细胞环境中确定并比较了 DeepCRISTL 学到的重要特征。我们相信,通过利用迁移学习来利用高通量数据集和更小、更具生物学相关性的数据集,DeepCRISTL 将提高许多其他 CRISPR/Cas9 编辑环境中的预测性能。

可用性和实现

DeepCRISTL 可通过 https://github.com/OrensteinLab/DeepCRISTL 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30cf/11319645/6561fc90e8ee/btae481f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验