Niu Kun, Luo Ximei, Zhang Shumei, Teng Zhixia, Zhang Tianjiao, Zhao Yuming
College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
Front Genet. 2021 Mar 23;12:665498. doi: 10.3389/fgene.2021.665498. eCollection 2021.
Enhancers are regulatory DNA sequences that could be bound by specific proteins named transcription factors (TFs). The interactions between enhancers and TFs regulate specific genes by increasing the target gene expression. Therefore, enhancer identification and classification have been a critical issue in the enhancer field. Unfortunately, so far there has been a lack of suitable methods to identify enhancers. Previous research has mainly focused on the features of the enhancer's function and interactions, which ignores the sequence information. As we know, the recurrent neural network (RNN) and long short-term memory (LSTM) models are currently the most common methods for processing time series data. LSTM is more suitable than RNN to address the DNA sequence. In this paper, we take the advantages of LSTM to build a method named iEnhancer-EBLSTM to identify enhancers. iEnhancer-ensembles of bidirectional LSTM (EBLSTM) consists of two steps. In the first step, we extract subsequences by sliding a 3-mer window along the DNA sequence as features. Second, EBLSTM model is used to identify enhancers from the candidate input sequences. We use the dataset from the study of Quang H et al. as the benchmarks. The experimental results from the datasets demonstrate the efficiency of our proposed model.
增强子是可被名为转录因子(TFs)的特定蛋白质结合的调控DNA序列。增强子与转录因子之间的相互作用通过增加靶基因表达来调控特定基因。因此,增强子的识别和分类一直是增强子领域的关键问题。不幸的是,到目前为止一直缺乏合适的方法来识别增强子。先前的研究主要集中在增强子的功能和相互作用的特征上,而忽略了序列信息。众所周知,循环神经网络(RNN)和长短期记忆(LSTM)模型是目前处理时间序列数据最常用的方法。LSTM比RNN更适合处理DNA序列。在本文中,我们利用LSTM的优势构建了一种名为iEnhancer-EBLSTM的方法来识别增强子。双向LSTM增强子集成(EBLSTM)由两个步骤组成。第一步,我们通过沿着DNA序列滑动一个3聚体窗口来提取子序列作为特征。第二步,使用EBLSTM模型从候选输入序列中识别增强子。我们将Quang H等人研究中的数据集用作基准。来自这些数据集的实验结果证明了我们所提出模型的有效性。