• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于深度学习利用局部特征和与一级序列的长期依赖性预测DNA结合蛋白。

Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning.

作者信息

Li Guobin, Du Xiuquan, Li Xinlu, Zou Le, Zhang Guanhong, Wu Zhize

机构信息

School of Artificial Intelligence and Big Data, Hefei University, Hefei, China.

School of Computer Science and Technology, Anhui University, Hefei, China.

出版信息

PeerJ. 2021 May 3;9:e11262. doi: 10.7717/peerj.11262. eCollection 2021.

DOI:10.7717/peerj.11262
PMID:33986992
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8101451/
Abstract

DNA-binding proteins (DBPs) play pivotal roles in many biological functions such as alternative splicing, RNA editing, and methylation. Many traditional machine learning (ML) methods and deep learning (DL) methods have been proposed to predict DBPs. However, these methods either rely on manual feature extraction or fail to capture long-term dependencies in the DNA sequence. In this paper, we propose a method, called PDBP-Fusion, to identify DBPs based on the fusion of local features and long-term dependencies only from primary sequences. We utilize convolutional neural network (CNN) to learn local features and use bi-directional long-short term memory network (Bi-LSTM) to capture critical long-term dependencies in context. Besides, we perform feature extraction, model training, and model prediction simultaneously. The PDBP-Fusion approach can predict DBPs with 86.45% sensitivity, 79.13% specificity, 82.81% accuracy, and 0.661 MCC on the PDB14189 benchmark dataset. The MCC of our proposed methods has been increased by at least 9.1% compared to other advanced prediction models. Moreover, the PDBP-Fusion also gets superior performance and model robustness on the PDB2272 independent dataset. It demonstrates that the PDBP-Fusion can be used to predict DBPs from sequences accurately and effectively; the online server is at http://119.45.144.26:8080/PDBP-Fusion/.

摘要

DNA结合蛋白(DBP)在许多生物学功能中发挥着关键作用,如可变剪接、RNA编辑和甲基化。已经提出了许多传统机器学习(ML)方法和深度学习(DL)方法来预测DBP。然而,这些方法要么依赖于手动特征提取,要么无法捕捉DNA序列中的长期依赖性。在本文中,我们提出了一种名为PDBP-Fusion的方法,仅基于一级序列的局部特征和长期依赖性融合来识别DBP。我们利用卷积神经网络(CNN)学习局部特征,并使用双向长短期记忆网络(Bi-LSTM)来捕捉上下文中的关键长期依赖性。此外,我们同时进行特征提取、模型训练和模型预测。PDBP-Fusion方法在PDB14189基准数据集上预测DBP的灵敏度为86.45%,特异性为79.13%,准确率为82.81%,马修斯相关系数(MCC)为0.661。与其他先进的预测模型相比,我们提出的方法的MCC至少提高了9.1%。此外,PDBP-Fusion在PDB2272独立数据集上也具有卓越的性能和模型稳健性。这表明PDBP-Fusion可以准确有效地从序列中预测DBP;在线服务器位于http://119.45.144.26:8080/PDBP-Fusion/ 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/12dba7a63d66/peerj-09-11262-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/58456282b78c/peerj-09-11262-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/0baead874c2a/peerj-09-11262-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/903dffeae579/peerj-09-11262-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/9dda81037560/peerj-09-11262-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/4686240277d2/peerj-09-11262-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/493cc23f018b/peerj-09-11262-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/a9ac096ebbce/peerj-09-11262-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/988d696c570a/peerj-09-11262-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/9551abbf5c7c/peerj-09-11262-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/12dba7a63d66/peerj-09-11262-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/58456282b78c/peerj-09-11262-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/0baead874c2a/peerj-09-11262-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/903dffeae579/peerj-09-11262-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/9dda81037560/peerj-09-11262-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/4686240277d2/peerj-09-11262-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/493cc23f018b/peerj-09-11262-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/a9ac096ebbce/peerj-09-11262-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/988d696c570a/peerj-09-11262-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/9551abbf5c7c/peerj-09-11262-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0ee/8101451/12dba7a63d66/peerj-09-11262-g010.jpg

相似文献

1
Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning.基于深度学习利用局部特征和与一级序列的长期依赖性预测DNA结合蛋白。
PeerJ. 2021 May 3;9:e11262. doi: 10.7717/peerj.11262. eCollection 2021.
2
Research on DNA-Binding Protein Identification Method Based on LSTM-CNN Feature Fusion.基于 LSTM-CNN 特征融合的 DNA 结合蛋白识别方法研究。
Comput Math Methods Med. 2022 Jun 2;2022:9705275. doi: 10.1155/2022/9705275. eCollection 2022.
3
On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach.仅基于一级序列预测DNA结合蛋白:一种深度学习方法。
PLoS One. 2017 Dec 29;12(12):e0188129. doi: 10.1371/journal.pone.0188129. eCollection 2017.
4
Learning to Monitor Machine Health with Convolutional Bi-Directional LSTM Networks.使用卷积双向长短期记忆网络学习监测机器健康状况。
Sensors (Basel). 2017 Jan 30;17(2):273. doi: 10.3390/s17020273.
5
DeepDNAbP: A deep learning-based hybrid approach to improve the identification of deoxyribonucleic acid-binding proteins.DeepDNAbP:一种基于深度学习的混合方法,用于提高脱氧核糖核酸结合蛋白的识别能力。
Comput Biol Med. 2022 Jun;145:105433. doi: 10.1016/j.compbiomed.2022.105433. Epub 2022 Mar 30.
6
EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction.EMDLP:用于 RNA 甲基化位点预测的集成多尺度深度学习模型。
BMC Bioinformatics. 2022 Jun 8;23(1):221. doi: 10.1186/s12859-022-04756-1.
7
Improved prediction of DNA and RNA binding proteins with deep learning models.深度学习模型提高 DNA 和 RNA 结合蛋白的预测能力。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae285.
8
Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences.基于深度神经网络的利用原始序列预测蛋白质相互作用。
Molecules. 2018 Aug 1;23(8):1923. doi: 10.3390/molecules23081923.
9
BiCaps-DBP: Predicting DNA-binding proteins from protein sequences using Bi-LSTM and a 1D-capsule network.BiCaps-DBP:使用 Bi-LSTM 和 1D-capsule 网络从蛋白质序列预测 DNA 结合蛋白。
Comput Biol Med. 2023 Sep;163:107241. doi: 10.1016/j.compbiomed.2023.107241. Epub 2023 Jul 8.
10
Ensemble deep learning models for protein secondary structure prediction using bidirectional temporal convolution and bidirectional long short-term memory.使用双向时间卷积和双向长短期记忆的集成深度学习模型用于蛋白质二级结构预测。
Front Bioeng Biotechnol. 2023 Feb 13;11:1051268. doi: 10.3389/fbioe.2023.1051268. eCollection 2023.

引用本文的文献

1
DeepRice6mA: A convolutional neural network approach for 6mA site prediction in the rice Genome.深度水稻6mA:一种用于水稻基因组中6mA位点预测的卷积神经网络方法。
PLoS One. 2025 Jun 18;20(6):e0325216. doi: 10.1371/journal.pone.0325216. eCollection 2025.
2
PLM-DBPs: enhancing plant DNA-binding protein prediction by integrating sequence-based and structure-aware protein language models.PLM-DBPs:通过整合基于序列和结构感知的蛋白质语言模型增强植物DNA结合蛋白预测
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf245.
3
DRBP-EDP: classification of DNA-binding proteins and RNA-binding proteins using ESM-2 and dual-path neural network.

本文引用的文献

1
PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method.PredDBP-Stack:基于堆叠集成方法的使用 HMM 轮廓预测 DNA 结合蛋白
Biomed Res Int. 2020 Apr 13;2020:7297631. doi: 10.1155/2020/7297631. eCollection 2020.
2
HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection.HMMPred:基于 HMM 轮廓和 XGBoost 特征选择的 DNA 结合蛋白精确预测。
Comput Math Methods Med. 2020 Mar 28;2020:1384749. doi: 10.1155/2020/1384749. eCollection 2020.
3
An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences.
DRBP-EDP:使用ESM-2和双路径神经网络对DNA结合蛋白和RNA结合蛋白进行分类
NAR Genom Bioinform. 2025 May 19;7(2):lqaf058. doi: 10.1093/nargab/lqaf058. eCollection 2025 Jun.
4
Benchmarking recent computational tools for DNA-binding protein identification.对近期用于DNA结合蛋白识别的计算工具进行基准测试。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae634.
5
ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins.ProkDBP:致力于更精确地识别原核 DNA 结合蛋白。
Protein Sci. 2024 Jun;33(6):e5015. doi: 10.1002/pro.5015.
6
A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond.蛋白质中心预测因子在生物分子相互作用研究中的综合综述:从蛋白质到核酸及其他。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae162.
7
BERT-TFBS: a novel BERT-based model for predicting transcription factor binding sites by transfer learning.BERT-TFBS:一种基于迁移学习的用于预测转录因子结合位点的新型基于BERT的模型。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae195.
8
SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction.SaPt-CNN-LSTM-AR-EA:一种用于基于时间序列的多变量DNA序列预测的混合集成学习框架。
PeerJ. 2023 Oct 4;11:e16192. doi: 10.7717/peerj.16192. eCollection 2023.
9
Transcription factor-based biosensors for screening and dynamic regulation.用于筛选和动态调控的基于转录因子的生物传感器。
Front Bioeng Biotechnol. 2023 Feb 6;11:1118702. doi: 10.3389/fbioe.2023.1118702. eCollection 2023.
10
Hybrid_DBP: Prediction of DNA-binding proteins using hybrid features and convolutional neural networks.Hybrid_DBP:利用混合特征和卷积神经网络预测DNA结合蛋白。
Front Pharmacol. 2022 Oct 10;13:1031759. doi: 10.3389/fphar.2022.1031759. eCollection 2022.
基于氨基酸序列中上下文特征的 DNA 结合蛋白预测的改进深度学习方法。
PLoS One. 2019 Nov 14;14(11):e0225317. doi: 10.1371/journal.pone.0225317. eCollection 2019.
4
MsDBP: Exploring DNA-Binding Proteins by Integrating Multiscale Sequence Information via Chou's Five-Step Rule.MsDBP:通过整合多尺度序列信息和周的五步法则探索 DNA 结合蛋白
J Proteome Res. 2019 Aug 2;18(8):3119-3132. doi: 10.1021/acs.jproteome.9b00226. Epub 2019 Jul 17.
5
DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information.DP-BINDER:一种通过融合进化和物理化学信息来预测 DNA 结合蛋白的机器学习模型。
J Comput Aided Mol Des. 2019 Jul;33(7):645-658. doi: 10.1007/s10822-019-00207-x. Epub 2019 May 23.
6
A high-performance approach for predicting donor splice sites based on short window size and imbalanced large samples.基于短窗口大小和不平衡大样本的供体剪接位点预测的高性能方法。
Biol Direct. 2019 Apr 11;14(1):6. doi: 10.1186/s13062-019-0236-y.
7
Object Detection in Very High-Resolution Aerial Images Using One-Stage Densely Connected Feature Pyramid Network.基于单阶段密集连接特征金字塔网络的超高分辨率航空图像目标检测
Sensors (Basel). 2018 Oct 6;18(10):3341. doi: 10.3390/s18103341.
8
pLoc_bal-mHum: Predict subcellular localization of human proteins by PseAAC and quasi-balancing training dataset.pLoc_bal-mHum:通过 PseAAC 和准平衡训练数据集预测人类蛋白质的亚细胞定位。
Genomics. 2019 Dec;111(6):1274-1282. doi: 10.1016/j.ygeno.2018.08.007. Epub 2018 Sep 1.
9
A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers.一种通过协调多视图特征和分类器来识别DNA结合蛋白的模型堆叠框架。
Genes (Basel). 2018 Aug 1;9(8):394. doi: 10.3390/genes9080394.
10
StackDPPred: a stacking based prediction of DNA-binding protein from sequence.StackDPPred:一种基于堆叠的 DNA 结合蛋白序列预测方法。
Bioinformatics. 2019 Feb 1;35(3):433-441. doi: 10.1093/bioinformatics/bty653.