School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China.
Interdiscip Sci. 2022 Jun;14(2):555-565. doi: 10.1007/s12539-022-00503-5. Epub 2022 Feb 21.
Enhancers are the primary cis-elements of transcriptional regulation and play a vital role in gene expression at different stages of plant growth and development. Having high locational variation and free scattering in non-encoding genomes, identification of enhancers is a crucial, but challenging work in understanding the biological mechanism of model plants. Recently, applications of neural network models are gaining increasing popularity in predicting the function of genomic elements. Although several computational models have shown great advantages to tackle this challenge, a further study of the identification of rice enhancers from DNA sequences is still lacking. We present RicENN, a novel deep learning framework capable of accurately identifying enhancers of rice, integrating convolution neural networks (CNNs), bi-directional recurrent neural networks (RNNs), and attention mechanisms. A combined-feature representation method was designed to extract the sequence features from original DNA sequences using six types of autocorrelation encodings. Moreover, we verified that the integrated model achieves the best performance by an ablation study. Finally, our deep learning framework realized a reliable prediction of the rice enhancers. The results show RicENN outperforms available alternative approaches in rice species, achieving the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) of 0.960 and 0.960 on cross-validation, and 0.879 and 0.877 during independent tests, respectively. This study develops a hybrid model to combine the merits of different neural network architectures, which shows the potential ability to apply deep learning in bioinformatic sequences and contributes to the acceleration of functional genomic studies of rice. RicENN and its code are freely accessible at http://bioinfor.aielab.cc/RicENN/ .
增强子是转录调控的主要顺式元件,在植物生长发育的不同阶段的基因表达中起着至关重要的作用。由于其在非编码基因组中的位置变化较大且呈自由散射状态,因此鉴定增强子是理解模式植物生物学机制的关键而具有挑战性的工作。最近,神经网络模型在预测基因组元件功能方面的应用越来越受到关注。尽管有几个计算模型已经显示出在解决这一挑战方面的巨大优势,但对从 DNA 序列中鉴定水稻增强子的进一步研究仍然缺乏。我们提出了 RicENN,这是一个新的深度学习框架,能够准确识别水稻增强子,集成卷积神经网络 (CNN)、双向递归神经网络 (RNN) 和注意力机制。设计了一种组合特征表示方法,使用六种自相关编码从原始 DNA 序列中提取序列特征。此外,通过消融研究验证了集成模型的最佳性能。最后,我们的深度学习框架实现了对水稻增强子的可靠预测。结果表明,RicENN 在水稻物种中优于现有的替代方法,在交叉验证中获得了接收器操作特性曲线 (AUROC) 和精度召回曲线下面积 (AUPRC) 的 0.960 和 0.960,在独立测试中分别为 0.879 和 0.877。这项研究开发了一种混合模型,结合了不同神经网络架构的优点,这表明深度学习在生物信息学序列中的应用具有潜在的能力,并为水稻功能基因组研究的加速做出了贡献。RicENN 及其代码可在 http://bioinfor.aielab.cc/RicENN/ 免费获取。