Zhu Yiqi, Sun Ailun
Department of Computer Science and Technology, College of Computer and Control Engineering, Northeast Forestry University, Harbin, China.
Front Genet. 2024 Jun 5;15:1411847. doi: 10.3389/fgene.2024.1411847. eCollection 2024.
The recognition of DNA Binding Proteins (DBPs) plays a crucial role in understanding biological functions such as replication, transcription, and repair. Although current sequence-based methods have shown some effectiveness, they often fail to fully utilize the potential of deep learning in capturing complex patterns. This study introduces a novel model, LGC-DBP, which integrates Long Short-Term Memory (LSTM), Gated Inception Convolution, and Improved Channel Attention mechanisms to enhance the prediction of DBPs. Initially, the model transforms protein sequences into Position Specific Scoring Matrices (PSSM), then processed through our deep learning framework. Within this framework, Gated Inception Convolution merges the concepts of gating units with the advantages of Graph Convolutional Network (GCN) and Dilated Convolution, significantly surpassing traditional convolution methods. The Improved Channel Attention mechanism substantially enhances the model's responsiveness and accuracy by shifting from a single input to three inputs and integrating three sigmoid functions along with an additional layer output. These innovative combinations have significantly improved model performance, enabling LGC-DBP to recognize and interpret the complex relationships within DBP features more accurately. The evaluation results show that LGC-DBP achieves an accuracy of 88.26% and a Matthews correlation coefficient of 0.701, both surpassing existing methods. These achievements demonstrate the model's strong capability in integrating and analyzing multi-dimensional data and mark a significant advancement over traditional methods by capturing deeper, nonlinear interactions within the data.
DNA结合蛋白(DBP)的识别在理解诸如复制、转录和修复等生物学功能方面起着至关重要的作用。尽管当前基于序列的方法已显示出一定的有效性,但它们往往未能充分利用深度学习在捕捉复杂模式方面的潜力。本研究引入了一种新型模型LGC-DBP,该模型集成了长短期记忆(LSTM)、门控 inception 卷积和改进的通道注意力机制,以增强对DBP的预测。最初,该模型将蛋白质序列转换为位置特异性评分矩阵(PSSM),然后通过我们的深度学习框架进行处理。在此框架内,门控 inception 卷积将门控单元的概念与图卷积网络(GCN)和扩张卷积的优势相结合,显著超越了传统卷积方法。改进的通道注意力机制通过从单个输入转变为三个输入,并集成三个 sigmoid 函数以及一个额外的层输出,大幅提高了模型的响应能力和准确性。这些创新组合显著提升了模型性能,使LGC-DBP能够更准确地识别和解释DBP特征中的复杂关系。评估结果表明,LGC-DBP的准确率达到88.26%,马修斯相关系数为0.701,均超过了现有方法。这些成果证明了该模型在整合和分析多维度数据方面的强大能力,并通过捕捉数据中更深层次的非线性相互作用,标志着相对于传统方法的重大进步。