Liu Liwei, Tan Zhebin, Wei Yuxiao, Sun Qianhui
College of Science, Dalian Jiaotong University, Dalian 116028, China.
College of Software, Dalian Jiaotong University, Dalian 116028, China.
Comput Biol Chem. 2025 Feb;114:108284. doi: 10.1016/j.compbiolchem.2024.108284. Epub 2024 Nov 19.
Enhancers are vital elements in the genome that boost the transcriptional activity of neighboring genes and are essential in regulating cell-specific gene expression. Therefore, accurately identifying and characterizing enhancers is essential for comprehending gene regulatory networks and the development of related diseases. This study introduces MPDL-Enhancer, a novel multi-perspective deep learning framework aimed at enhancer characterization and identification. In this study, enhancer sequences are encoded using the dna2vec model along with features derived from the structural properties of DNA sequences. Subsequently, these representations are processed through a novel dual-scale deep neural network designed to discern subtle correlations and extended interactions embedded within the semantic content of DNA. The predictive phase of our methodology employs a Support Vector Machine classifier to render the final classification. To rigorously assess the efficacy of our approach, a comprehensive evaluation was executed utilizing an independent test dataset, thereby substantiating the robustness and accuracy of our model. Our methodology demonstrated superior performance over existing computational techniques, with an accuracy (ACC) of 81.00 %, a sensitivity (SN) of 79.00 %, and specificity (SP) of 83.00 %. The innovative dual-scale deep neural network and the unique feature representation strategy contributed to this performance improvement. MPDL-Enhancer has effectively characterized enhancer sequences and achieved excellent predictive performance. Building upon this foundation, we conducted an interpretability analysis of the model, which can assist researchers in identifying key features and patterns that affect the functionality of enhancers, thereby promoting a deeper understanding of gene regulatory networks.
增强子是基因组中的关键元件,可促进邻近基因的转录活性,对调控细胞特异性基因表达至关重要。因此,准确识别和表征增强子对于理解基因调控网络以及相关疾病的发展至关重要。本研究引入了MPDL-Enhancer,这是一种旨在表征和识别增强子的新型多视角深度学习框架。在本研究中,增强子序列使用dna2vec模型进行编码,并结合从DNA序列结构特性衍生的特征。随后,这些表示通过一个新颖的双尺度深度神经网络进行处理,该网络旨在识别DNA语义内容中蕴含的微妙相关性和扩展相互作用。我们方法的预测阶段采用支持向量机分类器进行最终分类。为了严格评估我们方法的有效性,利用独立测试数据集进行了全面评估,从而证实了我们模型的稳健性和准确性。我们的方法表现优于现有计算技术,准确率(ACC)为81.00%,灵敏度(SN)为79.00%,特异性(SP)为83.00%。创新的双尺度深度神经网络和独特的特征表示策略促成了性能的提升。MPDL-Enhancer有效地表征了增强子序列并取得了优异的预测性能。在此基础上,我们对模型进行了解释性分析,这可以帮助研究人员识别影响增强子功能的关键特征和模式,从而促进对基因调控网络的更深入理解。