• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

仅基于一级序列预测DNA结合蛋白:一种深度学习方法。

On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach.

作者信息

Qu Yu-Hui, Yu Hua, Gong Xiu-Jun, Xu Jia-Hui, Lee Hong-Shun

机构信息

School of Computer Science and Technology, Tianjin University, Nankai, Tianjin, China, 30072.

Tianjin Key Laboratory of Cognitive Computing and Application, Nankai, Tianjin, China, 30072.

出版信息

PLoS One. 2017 Dec 29;12(12):e0188129. doi: 10.1371/journal.pone.0188129. eCollection 2017.

DOI:10.1371/journal.pone.0188129
PMID:29287069
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5747425/
Abstract

DNA-binding proteins play pivotal roles in alternative splicing, RNA editing, methylating and many other biological functions for both eukaryotic and prokaryotic proteomes. Predicting the functions of these proteins from primary amino acids sequences is becoming one of the major challenges in functional annotations of genomes. Traditional prediction methods often devote themselves to extracting physiochemical features from sequences but ignoring motif information and location information between motifs. Meanwhile, the small scale of data volumes and large noises in training data result in lower accuracy and reliability of predictions. In this paper, we propose a deep learning based method to identify DNA-binding proteins from primary sequences alone. It utilizes two stages of convolutional neutral network to detect the function domains of protein sequences, and the long short-term memory neural network to identify their long term dependencies, an binary cross entropy to evaluate the quality of the neural networks. When the proposed method is tested with a realistic DNA binding protein dataset, it achieves a prediction accuracy of 94.2% at the Matthew's correlation coefficient of 0.961. Compared with the LibSVM on the arabidopsis and yeast datasets via independent tests, the accuracy raises by 9% and 4% respectively. Comparative experiments using different feature extraction methods show that our model performs similar accuracy with the best of others, but its values of sensitivity, specificity and AUC increase by 27.83%, 1.31% and 16.21% respectively. Those results suggest that our method is a promising tool for identifying DNA-binding proteins.

摘要

DNA结合蛋白在真核生物和原核生物蛋白质组的可变剪接、RNA编辑、甲基化及许多其他生物学功能中发挥着关键作用。从一级氨基酸序列预测这些蛋白质的功能正成为基因组功能注释中的主要挑战之一。传统的预测方法通常致力于从序列中提取物理化学特征,却忽略了基序信息以及基序之间的位置信息。同时,训练数据量小且噪声大导致预测的准确性和可靠性较低。在本文中,我们提出了一种基于深度学习的方法,仅从一级序列中识别DNA结合蛋白。该方法利用两个阶段的卷积神经网络来检测蛋白质序列的功能域,利用长短期记忆神经网络来识别其长期依赖性,并使用二元交叉熵来评估神经网络的质量。当使用真实的DNA结合蛋白数据集对所提出的方法进行测试时,在马修斯相关系数为0.961时,其预测准确率达到了94.2%。通过独立测试,与拟南芥和酵母数据集上的LibSVM相比,准确率分别提高了9%和4%。使用不同特征提取方法的对比实验表明,我们的模型与其他最佳模型的准确率相似,但其灵敏度、特异性和AUC值分别提高了27.83%、1.31%和16.21%。这些结果表明,我们的方法是识别DNA结合蛋白的一种有前途的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/115a4c7dc5b7/pone.0188129.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/e0119792da0c/pone.0188129.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/434acf412cab/pone.0188129.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/54d70e9a8336/pone.0188129.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/eac353ffc6e3/pone.0188129.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/3e5c3b104d03/pone.0188129.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/f7c5e622a7dd/pone.0188129.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/e0b74cec581d/pone.0188129.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/115a4c7dc5b7/pone.0188129.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/e0119792da0c/pone.0188129.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/434acf412cab/pone.0188129.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/54d70e9a8336/pone.0188129.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/eac353ffc6e3/pone.0188129.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/3e5c3b104d03/pone.0188129.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/f7c5e622a7dd/pone.0188129.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/e0b74cec581d/pone.0188129.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/96bd/5747425/115a4c7dc5b7/pone.0188129.g008.jpg

相似文献

1
On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach.仅基于一级序列预测DNA结合蛋白:一种深度学习方法。
PLoS One. 2017 Dec 29;12(12):e0188129. doi: 10.1371/journal.pone.0188129. eCollection 2017.
2
Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning.基于深度学习利用局部特征和与一级序列的长期依赖性预测DNA结合蛋白。
PeerJ. 2021 May 3;9:e11262. doi: 10.7717/peerj.11262. eCollection 2021.
3
Deep Neural Network Based Predictions of Protein Interactions Using Primary Sequences.基于深度神经网络的利用原始序列预测蛋白质相互作用。
Molecules. 2018 Aug 1;23(8):1923. doi: 10.3390/molecules23081923.
4
An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences.基于氨基酸序列中上下文特征的 DNA 结合蛋白预测的改进深度学习方法。
PLoS One. 2019 Nov 14;14(11):e0225317. doi: 10.1371/journal.pone.0225317. eCollection 2019.
5
Research on DNA-Binding Protein Identification Method Based on LSTM-CNN Feature Fusion.基于 LSTM-CNN 特征融合的 DNA 结合蛋白识别方法研究。
Comput Math Methods Med. 2022 Jun 2;2022:9705275. doi: 10.1155/2022/9705275. eCollection 2022.
6
Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature.使用具有混合特征的随机森林模型从氨基酸序列预测蛋白质中的DNA结合残基。
Bioinformatics. 2009 Jan 1;25(1):30-5. doi: 10.1093/bioinformatics/btn583. Epub 2008 Nov 12.
7
Improving DNA-Binding Protein Prediction Using Three-Part Sequence-Order Feature Extraction and a Deep Neural Network Algorithm.利用三部分序列顺序特征提取和深度神经网络算法提高 DNA 结合蛋白预测。
J Chem Inf Model. 2023 Feb 13;63(3):1044-1057. doi: 10.1021/acs.jcim.2c00943. Epub 2023 Jan 31.
8
Enabling full-length evolutionary profiles based deep convolutional neural network for predicting DNA-binding proteins from sequence.基于全进化谱的深度卷积神经网络从序列预测 DNA 结合蛋白。
Proteins. 2020 Jan;88(1):15-30. doi: 10.1002/prot.25763. Epub 2019 Jul 8.
9
Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information.基于序列的具有保守性和相关性信息的蛋白质 DNA 结合残基预测。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1766-75. doi: 10.1109/TCBB.2012.106.
10
Predicting protein-ligand binding residues with deep convolutional neural networks.使用深度卷积神经网络预测蛋白质-配体结合残基。
BMC Bioinformatics. 2019 Feb 26;20(1):93. doi: 10.1186/s12859-019-2672-1.

引用本文的文献

1
DRBP-EDP: classification of DNA-binding proteins and RNA-binding proteins using ESM-2 and dual-path neural network.DRBP-EDP:使用ESM-2和双路径神经网络对DNA结合蛋白和RNA结合蛋白进行分类
NAR Genom Bioinform. 2025 May 19;7(2):lqaf058. doi: 10.1093/nargab/lqaf058. eCollection 2025 Jun.
2
Accurate prediction of nucleic acid binding proteins using protein language model.使用蛋白质语言模型准确预测核酸结合蛋白。
Bioinform Adv. 2025 Jan 20;5(1):vbaf008. doi: 10.1093/bioadv/vbaf008. eCollection 2025.
3
Overview and Prospects of DNA Sequence Visualization.

本文引用的文献

1
Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree.使用梯度提升决策树预测蛋白质相互作用的氨基酸序列多尺度编码
PLoS One. 2017 Aug 8;12(8):e0181426. doi: 10.1371/journal.pone.0181426. eCollection 2017.
2
Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences.基于蛋白质序列的单链和双链DNA结合蛋白分析与预测
BMC Bioinformatics. 2017 Jun 12;18(1):300. doi: 10.1186/s12859-017-1715-8.
3
RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach.
DNA序列可视化概述与展望
Int J Mol Sci. 2025 Jan 8;26(2):477. doi: 10.3390/ijms26020477.
4
Decoding the enigmatic estrogen paradox in pulmonary hypertension: delving into estrogen metabolites and metabolic enzymes.解读肺动脉高压中神秘的雌激素悖论:深入探究雌激素代谢产物和代谢酶
Cell Mol Biol Lett. 2024 Dec 18;29(1):155. doi: 10.1186/s11658-024-00671-w.
5
Benchmarking recent computational tools for DNA-binding protein identification.对近期用于DNA结合蛋白识别的计算工具进行基准测试。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae634.
6
Improved prediction of DNA and RNA binding proteins with deep learning models.深度学习模型提高 DNA 和 RNA 结合蛋白的预测能力。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae285.
7
LABAMPsGCN: A framework for identifying lactic acid bacteria antimicrobial peptides based on graph convolutional neural network.LABAMPsGCN:一种基于图卷积神经网络识别乳酸菌抗菌肽的框架。
Front Genet. 2022 Nov 3;13:1062576. doi: 10.3389/fgene.2022.1062576. eCollection 2022.
8
A novel deep learning-assisted hybrid network for plasmodium falciparum parasite mitochondrial proteins classification.一种新型深度学习辅助混合网络用于疟原虫寄生虫线粒体蛋白分类。
PLoS One. 2022 Oct 6;17(10):e0275195. doi: 10.1371/journal.pone.0275195. eCollection 2022.
9
Single-Stranded DNA Binding Proteins and Their Identification Using Machine Learning-Based Approaches.单链 DNA 结合蛋白及其基于机器学习的鉴定方法。
Biomolecules. 2022 Aug 26;12(9):1187. doi: 10.3390/biom12091187.
10
BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN.BERT-PPII:基于 BERT 和多通道 CNN 的聚脯氨酸 II 型螺旋结构预测模型。
Biomed Res Int. 2022 Aug 24;2022:9015123. doi: 10.1155/2022/9015123. eCollection 2022.
基于新型混合深度学习跨域知识整合方法的RNA-蛋白质结合基序挖掘
BMC Bioinformatics. 2017 Feb 28;18(1):136. doi: 10.1186/s12859-017-1561-8.
4
Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning.通过结合自互协方差变换和集成学习来鉴定DNA结合蛋白。
IEEE Trans Nanobioscience. 2016 Jun;15(4):328-334. doi: 10.1109/TNB.2016.2555951. Epub 2016 Apr 20.
5
Pse-Analysis: a python package for DNA/RNA and protein/ peptide sequence analysis based on pseudo components and kernel methods.伪分析:一个基于伪组件和核方法用于DNA/RNA以及蛋白质/肽序列分析的Python软件包。
Oncotarget. 2017 Feb 21;8(8):13338-13343. doi: 10.18632/oncotarget.14524.
6
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.基于超深度学习模型的蛋白质接触图从头精确预测
PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.
7
DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues.DNABP:基于随机森林特征选择识别DNA结合蛋白并预测结合残基
PLoS One. 2016 Dec 1;11(12):e0167345. doi: 10.1371/journal.pone.0167345. eCollection 2016.
8
PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation.PseDNA-Pro:结合周氏伪氨基酸组成和物理化学距离变换的DNA结合蛋白鉴定方法
Mol Inform. 2015 Jan;34(1):8-17. doi: 10.1002/minf.201400025. Epub 2014 Sep 26.
9
Deep learning for computational biology.用于计算生物学的深度学习。
Mol Syst Biol. 2016 Jul 29;12(7):878. doi: 10.15252/msb.20156651.
10
Convolutional neural network architectures for predicting DNA-protein binding.用于预测DNA-蛋白质结合的卷积神经网络架构。
Bioinformatics. 2016 Jun 15;32(12):i121-i127. doi: 10.1093/bioinformatics/btw255.