Chen Yuanting, Chen Long, Yu Xinxin, Li Weihua, Tang Yun, Liu Guixia
Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China.
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf443.
RNA interference (RNAi) is a technique for precisely silencing the expression of specific genes by means of small RNA molecules and is essential in functional genomics. Among the commonly used RNAi molecules, short hairpin RNAs (shRNAs) exhibit advantages over small interfering RNAs, including longer half-life, comparable silencing efficiency, fewer off-target effects, and greater safety. However, traditional screening of potent shRNAs is costly and time-consuming. Advances in big data and artificial intelligence have enabled computational methods to significantly accelerate shRNA design and prediction. In this study, we propose BBANsh, a new shRNA prediction model based on bidirectional encoder representation from transformers (BERT) and bilinear attention network (BAN). We comprehensively evaluate the performance of BBANsh against traditional feature-based models, various feature fusion methods, and existing shRNA prediction models. The BBANsh has achieved an area under the precision-recall curve of 0.951 on five-cross validation and a prediction accuracy of 0.896 on a new external validation set, highlighting its superior predictive performance. Ablation experiments validate the significant contributions of BERT and BAN to model performance. The visualization of internal feature representations intuitively demonstrates the effectiveness of the feature fusion strategy of BBANsh. Furthermore, the attentional analysis reveals that nucleotides near the 5' end have the greatest impact on model predictions, highlighting sequence characteristics of potent shRNAs. Overall, BBANsh provides an efficient and reliable tool for shRNA prediction, which can offer valuable support for researchers in the precise selection and design of shRNA.
RNA干扰(RNAi)是一种通过小RNA分子精确沉默特定基因表达的技术,在功能基因组学中至关重要。在常用的RNAi分子中,短发夹RNA(shRNA)相对于小干扰RNA具有优势,包括半衰期更长、沉默效率相当、脱靶效应更少以及安全性更高。然而,传统的强效shRNA筛选成本高且耗时。大数据和人工智能的发展使计算方法能够显著加速shRNA的设计和预测。在本研究中,我们提出了BBANsh,一种基于变换器双向编码器表征(BERT)和双线性注意力网络(BAN)的新型shRNA预测模型。我们全面评估了BBANsh相对于传统基于特征的模型、各种特征融合方法以及现有shRNA预测模型的性能。BBANsh在五折交叉验证中实现了精确召回曲线下面积为0.951,在新的外部验证集上预测准确率为0.896,突出了其卓越的预测性能。消融实验验证了BERT和BAN对模型性能的显著贡献。内部特征表示的可视化直观地展示了BBANsh特征融合策略的有效性。此外,注意力分析表明5'端附近的核苷酸对模型预测影响最大,突出了强效shRNA的序列特征。总体而言,BBANsh为shRNA预测提供了一种高效可靠的工具,可为研究人员在shRNA的精确选择和设计中提供有价值的支持。