Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, 411105, Xiangtan, China.
College of Computer Science and Electronic Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China.
Brief Funct Genomics. 2022 Sep 16;21(5):399-407. doi: 10.1093/bfgp/elac023.
Identification and classification of enhancers are highly significant because they play crucial roles in controlling gene transcription. Recently, several deep learning-based methods for identifying enhancers and their strengths have been developed. However, existing methods are usually limited because they use only local or only global features. The combination of local and global features is critical to further improve the prediction performance. In this work, we propose a novel deep learning-based method, called iEnhancer-DLRA, to identify enhancers and their strengths. iEnhancer-DLRA extracts local and multi-scale global features of sequences by using a residual convolutional network and two bidirectional long short-term memory networks. Then, a self-attention fusion strategy is proposed to deeply integrate these local and global features. The experimental results on the independent test dataset indicate that iEnhancer-DLRA performs better than nine existing state-of-the-art methods in both identification and classification of enhancers in almost all metrics. iEnhancer-DLRA achieves 13.8% (for identifying enhancers) and 12.6% (for classifying strengths) improvement in accuracy compared with the best existing state-of-the-art method. This is the first time that the accuracy of an enhancer identifier exceeds 0.9 and the accuracy of the enhancer classifier exceeds 0.8 on the independent test set. Moreover, iEnhancer-DLRA achieves superior predictive performance on the rice dataset compared with the state-of-the-art method RiceENN.
鉴定和分类增强子非常重要,因为它们在控制基因转录中起着至关重要的作用。最近,已经开发了几种基于深度学习的鉴定增强子及其强度的方法。然而,现有的方法通常是有限的,因为它们只使用局部或全局特征。局部和全局特征的结合对于进一步提高预测性能至关重要。在这项工作中,我们提出了一种新的基于深度学习的方法,称为 iEnhancer-DLRA,用于鉴定增强子及其强度。iEnhancer-DLRA 通过使用残差卷积网络和两个双向长短时记忆网络,提取序列的局部和多尺度全局特征。然后,提出了一种自注意力融合策略,以深入整合这些局部和全局特征。在独立测试数据集上的实验结果表明,iEnhancer-DLRA 在几乎所有指标上的识别和分类增强子的性能均优于现有的九种最先进的方法。与最好的现有最先进的方法相比,iEnhancer-DLRA 在识别增强子的准确率方面提高了 13.8%,在分类增强子强度的准确率方面提高了 12.6%。这是第一次在独立测试集上,增强子识别器的准确率超过 0.9,增强子分类器的准确率超过 0.8。此外,与最先进的方法 RiceENN 相比,iEnhancer-DLRA 在水稻数据集上实现了卓越的预测性能。