IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):637-645. doi: 10.1109/TCBB.2022.3142019. Epub 2023 Feb 3.
Identifying enhancers is a critical task in bioinformatics due to their primary role in regulating gene expression. For this reason, various computational algorithms devoted to enhancer identification have been put forward over the years. More features are extracted from the single DNA sequences to boost the performance. Nevertheless, DNA structural information is neglected, which is an essential factor affecting the binding preferences of transcription factors to regulatory elements like enhancers. Here, we propose SENIES, a DNA shape enhanced deep learning predictor, to identify enhancers and their strength. The predictor consists of two layers where the first layer is for enhancer and non-enhancer identification, and the second layer is for predicting the strength of enhancers. Apart from two common sequence-derived features (i.e., one-hot and k-mer), DNA shape is introduced to describe the 3D structures of DNA sequences. Performance comparison with state-of-the-art methods conducted on public datasets demonstrates the effectiveness and robustness of our predictor. The code implementation of SENIES is publicly available at https://github.com/hlju-liye/SENIES.
鉴定增强子是生物信息学中的一项关键任务,因为它们在调节基因表达方面起着主要作用。出于这个原因,多年来提出了各种专门用于增强子鉴定的计算算法。从单个 DNA 序列中提取更多的特征来提高性能。然而,DNA 结构信息被忽视了,这是影响转录因子与调节元件(如增强子)结合偏好的一个重要因素。在这里,我们提出了 SENIES,一种用于识别增强子及其强度的 DNA 形状增强深度学习预测器。该预测器由两个层组成,第一层用于增强子和非增强子的识别,第二层用于预测增强子的强度。除了两个常见的基于序列的特征(即 one-hot 和 k-mer)之外,还引入了 DNA 形状来描述 DNA 序列的 3D 结构。在公共数据集上与最新方法的性能比较证明了我们预测器的有效性和鲁棒性。SENIES 的代码实现可在 https://github.com/hlju-liye/SENIES 上获得。