Du Weian, Zhao Liang, Diao Kaichuan, Zheng Yangyang, Yang Qianyong, Zhu Zhenzhen, Zhu Xiangxing, Tang Dongsheng
Gene Editing Technology Center of Guangdong Province, School of Medicine, Foshan University, Foshan, Guangdong, China.
Shenzhen Health Development Research and Data Management Center, Shenzhen, Guangdong, China.
Commun Biol. 2025 Jun 6;8(1):882. doi: 10.1038/s42003-025-08275-6.
Genome editing with the CRISPR/Cas9 system has revolutionized life and medical sciences, particularly in treating monogenic genetic diseases by enabling long-term therapeutic effects from a single intervention. However, the CRISPR/Cas9 system can tolerate mismatches and DNA/RNA bulges at target sites, leading to unintended off-target effects that pose challenges for gene-editing therapy development. Existing high-throughput detection and in silico prediction methods are often limited to specifically designed single guide RNAs (sgRNAs) and perform poorly on unseen sequences. To address these limitations, we introduce CCLMoff, a deep learning framework for off-target prediction that incorporates a pretrained RNA language model from RNAcentral. CCLMoff captures mutual sequence information between sgRNAs and target sites and is trained on a comprehensive, updated dataset. This approach enables accurate off-target identification and strong generalization across diverse NGS-based detection datasets. Model interpretation reveals the biological importance of the seed region, underscoring CCLMoff's analytical capabilities. The development of CCLMoff lays the foundation for a comprehensive, end-to-end sgRNA design platform, enhancing both the precision and efficiency of CRISPR/Cas9-based therapeutics. CCLMoff is a versatile tool and is publicly available at github.com/duwa2/CCLMoff .
利用CRISPR/Cas9系统进行基因组编辑已经彻底改变了生命科学和医学,特别是在治疗单基因遗传病方面,通过单次干预就能实现长期治疗效果。然而,CRISPR/Cas9系统能够容忍靶点处的错配和DNA/RNA凸起,从而导致意外的脱靶效应,这给基因编辑疗法的开发带来了挑战。现有的高通量检测和计算机预测方法通常仅限于专门设计的单导向RNA(sgRNA),并且在未见序列上表现不佳。为了解决这些局限性,我们引入了CCLMoff,这是一种用于脱靶预测的深度学习框架,它整合了来自RNAcentral的预训练RNA语言模型。CCLMoff捕捉sgRNA和靶点之间的相互序列信息,并在一个全面的、更新后的数据集上进行训练。这种方法能够准确识别脱靶效应,并在各种基于二代测序(NGS)的检测数据集上具有很强的泛化能力。模型解释揭示了种子区域的生物学重要性,突出了CCLMoff的分析能力。CCLMoff的开发为一个全面的、端到端的sgRNA设计平台奠定了基础,提高了基于CRISPR/Cas9疗法的精度和效率。CCLMoff是一个多功能工具,可在github.com/duwa2/CCLMoff上公开获取。