He Guodong, Ye Jiahao, Hao Huijun, Chen Wei
School of Information Engineering, Wenzhou Business College, Wenzhou, Zhejiang, PR China.
PLoS One. 2025 May 7;20(5):e0322978. doi: 10.1371/journal.pone.0322978. eCollection 2025.
Predicting protein-DNA binding sites in vivo is a challenging but urgent task in many fields such as drug design and development. Most promoters contain many transcription factor (TF) binding sites, yet only a few have been identified through time-consuming biochemical experiments. To address this challenge, numerous computational approaches have been proposed to predict TF binding sites from DNA sequences. However, current deep learning methods often face issues such as gradient vanishing as the model depth increases, leading to suboptimal feature extraction.
We propose a model called CBR-KAN (where C represents Convolutional Neural Network (CNN), B represents Bidirectional Long Short Term Memory (BiLSTM), and R represents Residual Mechanism) to predict transcription factor binding sites. Specifically, we designed a multi-scale convolution module (ConvBlock1, 2, 3) combined with BiLSTM network, introduced KAN network to replace traditional multilayer perceptron, and promoted model optimization through residual connections. Testing on 50 common ChIP seq benchmark datasets shows that CBR-KAN outperforms other state-of-the-art methods such as DeepBind, DanQ, DeepD2V, and DeepSEA in predicting TF binding sites.
The CBR-KAN model significantly improves prediction accuracy for transcription factor binding sites by effectively integrating multiple neural network architectures and mechanisms. This approach not only enhances feature extraction but also stabilizes training and boosts generalization capabilities. The promising results on multiple key performance indicators demonstrate the potential of CBR-KAN in bioinformatics applications.
在药物设计与开发等诸多领域,预测体内蛋白质 - DNA 结合位点是一项具有挑战性但又紧迫的任务。大多数启动子包含许多转录因子(TF)结合位点,但通过耗时的生化实验仅鉴定出了少数几个。为应对这一挑战,人们提出了众多计算方法来从 DNA 序列预测 TF 结合位点。然而,当前的深度学习方法常常面临随着模型深度增加梯度消失等问题,导致特征提取效果欠佳。
我们提出了一种名为 CBR - KAN 的模型(其中 C 代表卷积神经网络(CNN),B 代表双向长短期记忆网络(BiLSTM),R 代表残差机制)来预测转录因子结合位点。具体而言,我们设计了一个与 BiLSTM 网络相结合的多尺度卷积模块(ConvBlock1、2、3),引入 KAN 网络来替代传统的多层感知器,并通过残差连接促进模型优化。在 50 个常见的 ChIP seq 基准数据集上进行测试表明,在预测 TF 结合位点方面,CBR - KAN 优于其他当前最先进的方法,如 DeepBind、DanQ、DeepD2V 和 DeepSEA。
CBR - KAN 模型通过有效整合多种神经网络架构和机制,显著提高了转录因子结合位点的预测准确性。这种方法不仅增强了特征提取能力,还稳定了训练并提升了泛化能力。在多个关键性能指标上取得的良好结果证明了 CBR - KAN 在生物信息学应用中的潜力。