Suppr超能文献

使用具有BERT嵌入的双向长短期记忆网络预测人类原代细胞中的CRISPR-Cas9脱靶效应。

Predicting CRISPR-Cas9 off-target effects in human primary cells using bidirectional LSTM with BERT embedding.

作者信息

Sari Orhan, Liu Ziying, Pan Youlian, Shao Xiaojian

机构信息

Department of Mining and Materials Engineering, McGill University, Montreal, QC, H3A 2B1, Canada.

Digital Technologies Research Center, National Research Council Canada, Ottawa, ON, K1A 0R6, Canada.

出版信息

Bioinform Adv. 2024 Dec 30;5(1):vbae184. doi: 10.1093/bioadv/vbae184. eCollection 2025.

Abstract

MOTIVATION

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas9 system is a ground-breaking genome editing tool, which has revolutionized cell and gene therapies. One of the essential components involved in this system that ensures its success is the design of an optimal single-guide RNA (sgRNA) with high on-target cleavage efficiency and low off-target effects. This is challenging as many conditions need to be considered, and empirically testing every design is time-consuming and costly. prediction using machine learning models provides high-performance alternatives.

RESULTS

We present CrisprBERT, a deep learning model incorporating a Bidirectional Encoder Representations from Transformers (BERT) architecture to provide a high-dimensional embedding for paired sgRNA and DNA sequences and Bidirectional Long Short-term Memory networks for learning, to predict the off-target effects of sgRNAs utilizing only the sgRNAs and their paired DNA sequences. We proposed doublet stack encoding to capture the local energy configuration of the Cas9 binding and applied the BERT model to learn the contextual embedding of the doublet pairs. Our results showed that the new model achieved better performance than state-of-the-art deep learning models regarding single split and leave-one-sgRNA-out cross-validations as well as independent testing.

AVAILABILITY AND IMPLEMENTATION

The CrisprBERT is available at GitHub: https://github.com/OSsari/CrisprBERT.

摘要

动机

成簇规律间隔短回文重复序列(CRISPR)-Cas9系统是一种开创性的基因组编辑工具,它彻底改变了细胞和基因疗法。确保该系统成功的关键组成部分之一是设计具有高靶向切割效率和低脱靶效应的最佳单向导RNA(sgRNA)。由于需要考虑许多条件,并且对每个设计进行实证测试既耗时又昂贵,因此这具有挑战性。使用机器学习模型进行预测提供了高性能的替代方案。

结果

我们提出了CrisprBERT,这是一种深度学习模型,它结合了来自Transformer(BERT)架构的双向编码器表示,为配对的sgRNA和DNA序列提供高维嵌入,并结合双向长短期记忆网络进行学习,以仅利用sgRNA及其配对的DNA序列来预测sgRNA的脱靶效应。我们提出了双峰堆叠编码来捕获Cas9结合的局部能量配置,并应用BERT模型来学习双峰对的上下文嵌入。我们的结果表明,在单分割和留一sgRNA交叉验证以及独立测试方面,新模型比现有最先进的深度学习模型表现更好。

可用性和实现

CrisprBERT可在GitHub上获取:https://github.com/OSsari/CrisprBERT。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8239/11696696/a92544013bfb/vbae184f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验