Suppr
超能文献

一种基于KAN的混合深度神经网络，用于准确识别转录因子结合位点。

A KAN-based hybrid deep neural networks for accurate identification of transcription factor binding sites.

作者信息

He Guodong, Ye Jiahao, Hao Huijun, Chen Wei

机构信息

School of Information Engineering, Wenzhou Business College, Wenzhou, Zhejiang, PR China.

出版信息

PLoS One. 2025 May 7;20(5):e0322978. doi: 10.1371/journal.pone.0322978. eCollection 2025.

DOI:10.1371/journal.pone.0322978

PMID:40334196

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12058130/

Abstract

BACKGROUND

Predicting protein-DNA binding sites in vivo is a challenging but urgent task in many fields such as drug design and development. Most promoters contain many transcription factor (TF) binding sites, yet only a few have been identified through time-consuming biochemical experiments. To address this challenge, numerous computational approaches have been proposed to predict TF binding sites from DNA sequences. However, current deep learning methods often face issues such as gradient vanishing as the model depth increases, leading to suboptimal feature extraction.

RESULTS

We propose a model called CBR-KAN (where C represents Convolutional Neural Network (CNN), B represents Bidirectional Long Short Term Memory (BiLSTM), and R represents Residual Mechanism) to predict transcription factor binding sites. Specifically, we designed a multi-scale convolution module (ConvBlock1, 2, 3) combined with BiLSTM network, introduced KAN network to replace traditional multilayer perceptron, and promoted model optimization through residual connections. Testing on 50 common ChIP seq benchmark datasets shows that CBR-KAN outperforms other state-of-the-art methods such as DeepBind, DanQ, DeepD2V, and DeepSEA in predicting TF binding sites.

CONCLUSIONS

The CBR-KAN model significantly improves prediction accuracy for transcription factor binding sites by effectively integrating multiple neural network architectures and mechanisms. This approach not only enhances feature extraction but also stabilizes training and boosts generalization capabilities. The promising results on multiple key performance indicators demonstrate the potential of CBR-KAN in bioinformatics applications.

摘要

背景

在药物设计与开发等诸多领域，预测体内蛋白质 - DNA 结合位点是一项具有挑战性但又紧迫的任务。大多数启动子包含许多转录因子（TF）结合位点，但通过耗时的生化实验仅鉴定出了少数几个。为应对这一挑战，人们提出了众多计算方法来从 DNA 序列预测 TF 结合位点。然而，当前的深度学习方法常常面临随着模型深度增加梯度消失等问题，导致特征提取效果欠佳。

结果

我们提出了一种名为 CBR - KAN 的模型（其中 C 代表卷积神经网络（CNN），B 代表双向长短期记忆网络（BiLSTM），R 代表残差机制）来预测转录因子结合位点。具体而言，我们设计了一个与 BiLSTM 网络相结合的多尺度卷积模块（ConvBlock1、2、3），引入 KAN 网络来替代传统的多层感知器，并通过残差连接促进模型优化。在 50 个常见的 ChIP seq 基准数据集上进行测试表明，在预测 TF 结合位点方面，CBR - KAN 优于其他当前最先进的方法，如 DeepBind、DanQ、DeepD2V 和 DeepSEA。