• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

iDNAProt-ES:利用进化和结构特征鉴定 DNA 结合蛋白。

iDNAProt-ES: Identification of DNA-binding Proteins Using Evolutionary and Structural Features.

机构信息

Department of Computer Science and Engineering, United International University, House 80, Road 8A, Dhanmondi, Dhaka, 1209, Bangladesh.

Department of Computer Science, Morgan State University, Baltimore, Maryland, United States.

出版信息

Sci Rep. 2017 Nov 2;7(1):14938. doi: 10.1038/s41598-017-14945-1.

DOI:10.1038/s41598-017-14945-1
PMID:29097781
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5668250/
Abstract

DNA-binding proteins play a very important role in the structural composition of the DNA. In addition, they regulate and effect various cellular processes like transcription, DNA replication, DNA recombination, repair and modification. The experimental methods used to identify DNA-binding proteins are expensive and time consuming and thus attracted researchers from computational field to address the problem. In this paper, we present iDNAProt-ES, a DNA-binding protein prediction method that utilizes both sequence based evolutionary and structure based features of proteins to identify their DNA-binding functionality. We used recursive feature elimination to extract an optimal set of features and train them using Support Vector Machine (SVM) with linear kernel to select the final model. Our proposed method significantly outperforms the existing state-of-the-art predictors on standard benchmark dataset. The accuracy of the predictor is 90.18% using jack knife test and 88.87% using 10-fold cross validation on the benchmark dataset. The accuracy of the predictor on the independent dataset is 80.64% which is also significantly better than the state-of-the-art methods. iDNAProt-ES is a novel prediction method that uses evolutionary and structural based features. We believe the superior performance of iDNAProt-ES will motivate the researchers to use this method to identify DNA-binding proteins. iDNAProt-ES is publicly available as a web server at: http://brl.uiu.ac.bd/iDNAProt-ES/ .

摘要

DNA 结合蛋白在 DNA 的结构组成中起着非常重要的作用。此外,它们还调节和影响转录、DNA 复制、DNA 重组、修复和修饰等各种细胞过程。用于鉴定 DNA 结合蛋白的实验方法既昂贵又耗时,因此吸引了计算领域的研究人员来解决这个问题。在本文中,我们提出了 iDNAProt-ES,这是一种 DNA 结合蛋白预测方法,它利用蛋白质的序列进化和结构特征来识别其 DNA 结合功能。我们使用递归特征消除来提取最佳特征集,并使用带有线性核的支持向量机 (SVM) 对其进行训练,以选择最终模型。我们提出的方法在标准基准数据集上显著优于现有的最先进的预测器。在基准数据集上,使用 jack knife 测试和 10 倍交叉验证的预测器的准确率分别为 90.18%和 88.87%。在独立数据集上的预测器的准确率为 80.64%,也明显优于最先进的方法。iDNAProt-ES 是一种使用进化和基于结构的特征的新型预测方法。我们相信 iDNAProt-ES 的优越性能将激励研究人员使用这种方法来鉴定 DNA 结合蛋白。iDNAProt-ES 作为一个网络服务器,可在以下网址获得:http://brl.uiu.ac.bd/iDNAProt-ES/ 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/468c/5668250/a8ff9a70c965/41598_2017_14945_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/468c/5668250/2012d561f3c8/41598_2017_14945_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/468c/5668250/ec17ff340574/41598_2017_14945_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/468c/5668250/f58b90e1ef37/41598_2017_14945_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/468c/5668250/6cdc19d0256b/41598_2017_14945_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/468c/5668250/b71845bb958d/41598_2017_14945_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/468c/5668250/9cb76377b3fd/41598_2017_14945_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/468c/5668250/a8ff9a70c965/41598_2017_14945_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/468c/5668250/2012d561f3c8/41598_2017_14945_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/468c/5668250/ec17ff340574/41598_2017_14945_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/468c/5668250/f58b90e1ef37/41598_2017_14945_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/468c/5668250/6cdc19d0256b/41598_2017_14945_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/468c/5668250/b71845bb958d/41598_2017_14945_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/468c/5668250/9cb76377b3fd/41598_2017_14945_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/468c/5668250/a8ff9a70c965/41598_2017_14945_Figa_HTML.jpg

相似文献

1
iDNAProt-ES: Identification of DNA-binding Proteins Using Evolutionary and Structural Features.iDNAProt-ES:利用进化和结构特征鉴定 DNA 结合蛋白。
Sci Rep. 2017 Nov 2;7(1):14938. doi: 10.1038/s41598-017-14945-1.
2
DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC.DPP-PseAAC:一种基于 Chou 的通用 PseAAC 的 DNA 结合蛋白预测模型。
J Theor Biol. 2018 Sep 7;452:22-34. doi: 10.1016/j.jtbi.2018.05.006. Epub 2018 May 16.
3
DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information.DP-BINDER:一种通过融合进化和物理化学信息来预测 DNA 结合蛋白的机器学习模型。
J Comput Aided Mol Des. 2019 Jul;33(7):645-658. doi: 10.1007/s10822-019-00207-x. Epub 2019 May 23.
4
Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins.利用进化和结构信息预测DNA结合蛋白上的DNA结合位点。
Proteins. 2006 Jul 1;64(1):19-27. doi: 10.1002/prot.20977.
5
iPHLoc-ES: Identification of bacteriophage protein locations using evolutionary and structural features.iPHLoc-ES:利用进化和结构特征鉴定噬菌体蛋白位置
J Theor Biol. 2017 Dec 21;435:229-237. doi: 10.1016/j.jtbi.2017.09.022. Epub 2017 Sep 21.
6
A feature-based approach to predict hot spots in protein-DNA binding interfaces.基于特征的方法预测蛋白质-DNA 结合界面热点。
Brief Bioinform. 2020 May 21;21(3):1038-1046. doi: 10.1093/bib/bbz037.
7
TargetDBP: Accurate DNA-Binding Protein Prediction Via Sequence-Based Multi-View Feature Learning.目标 DBP:基于序列的多视图特征学习的准确 DNA 结合蛋白预测。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Jul-Aug;17(4):1419-1429. doi: 10.1109/TCBB.2019.2893634. Epub 2019 Jan 18.
8
iProtGly-SS: Identifying protein glycation sites using sequence and structure based features.iProtGly-SS:基于序列和结构特征鉴定蛋白质糖基化位点。
Proteins. 2018 Jul;86(7):777-789. doi: 10.1002/prot.25511. Epub 2018 May 2.
9
DNAPred: Accurate Identification of DNA-Binding Sites from Protein Sequence by Ensembled Hyperplane-Distance-Based Support Vector Machines.DNAPred:基于超平面距离集成支持向量机的蛋白质序列 DNA 结合位点准确识别。
J Chem Inf Model. 2019 Jun 24;59(6):3057-3071. doi: 10.1021/acs.jcim.8b00749. Epub 2019 Apr 16.
10
PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation.PSFM-DBT:通过结合位置特异性频率矩阵和距离双字母变换识别DNA结合蛋白。
Int J Mol Sci. 2017 Aug 25;18(9):1856. doi: 10.3390/ijms18091856.

引用本文的文献

1
GRU4ACE: Enhancing ACE inhibitory peptide prediction by integrating gated recurrent unit with multi-source feature embeddings.GRU4ACE:通过将门控循环单元与多源特征嵌入相结合来增强血管紧张素转换酶抑制肽预测
Protein Sci. 2025 Jun;34(6):e70026. doi: 10.1002/pro.70026.
2
Accurate prediction of nucleic acid binding proteins using protein language model.使用蛋白质语言模型准确预测核酸结合蛋白。
Bioinform Adv. 2025 Jan 20;5(1):vbaf008. doi: 10.1093/bioadv/vbaf008. eCollection 2025.
3
Benchmarking recent computational tools for DNA-binding protein identification.

本文引用的文献

1
POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles.POSSUM:一种基于位置特异性得分矩阵(PSSM)谱生成数字序列特征描述符的生物信息学工具包。
Bioinformatics. 2017 Sep 1;33(17):2756-2758. doi: 10.1093/bioinformatics/btx302.
2
pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAAC.pLoc-mPlant:通过将最优的基因本体(GO)信息整合到通用的伪氨基酸组成(PseAAC)中,预测多定位植物蛋白的亚细胞定位
Mol Biosyst. 2017 Aug 22;13(9):1722-1727. doi: 10.1039/c7mb00267j.
3
iRNA-2methyl: Identify RNA 2'-O-methylation Sites by Incorporating Sequence-Coupled Effects into General PseKNC and Ensemble Classifier.
对近期用于DNA结合蛋白识别的计算工具进行基准测试。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae634.
4
Improving prediction performance of general protein language model by domain-adaptive pretraining on DNA-binding protein.通过在 DNA 结合蛋白上进行领域自适应预训练来提高通用蛋白质语言模型的预测性能。
Nat Commun. 2024 Sep 7;15(1):7838. doi: 10.1038/s41467-024-52293-7.
5
LGC-DBP: the method of DNA-binding protein identification based on PSSM and deep learning.LGC-DBP:基于位置特异性得分矩阵(PSSM)和深度学习的DNA结合蛋白识别方法。
Front Genet. 2024 Jun 5;15:1411847. doi: 10.3389/fgene.2024.1411847. eCollection 2024.
6
Improved prediction of DNA and RNA binding proteins with deep learning models.深度学习模型提高 DNA 和 RNA 结合蛋白的预测能力。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae285.
7
StackDPP: a stacking ensemble based DNA-binding protein prediction model.StackDPP:一种基于堆叠集成的 DNA 结合蛋白预测模型。
BMC Bioinformatics. 2024 Mar 14;25(1):111. doi: 10.1186/s12859-024-05714-9.
8
HormoNet: a deep learning approach for hormone-drug interaction prediction.HormoNet:一种用于激素-药物相互作用预测的深度学习方法。
BMC Bioinformatics. 2024 Feb 28;25(1):87. doi: 10.1186/s12859-024-05708-7.
9
Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features.深度WET:一种基于深度学习的方法,利用带加权特征的词嵌入技术预测DNA结合蛋白。
Sci Rep. 2024 Feb 5;14(1):2961. doi: 10.1038/s41598-024-52653-9.
10
Phonon-assisted nearly pure spin current in DNA molecular chains: a multifractal analysis.DNA分子链中声子辅助的近纯自旋流:多重分形分析
Sci Rep. 2023 Dec 2;13(1):21281. doi: 10.1038/s41598-023-48644-x.
iRNA-2甲基化:通过将序列耦合效应纳入通用伪核苷酸组成和集成分类器来识别RNA 2'-O-甲基化位点。
Med Chem. 2017;13(8):734-743. doi: 10.2174/1573406413666170623082245.
4
2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function.2L-piRNA:一种用于识别Piwi相互作用RNA及其功能的双层集成分类器。
Mol Ther Nucleic Acids. 2017 Jun 16;7:267-277. doi: 10.1016/j.omtn.2017.04.008. Epub 2017 Apr 13.
5
iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC.iRNA-PseColl:通过将核苷酸的集体效应纳入伪核苷酸组成来识别不同RNA修饰的发生位点
Mol Ther Nucleic Acids. 2017 Jun 16;7:155-163. doi: 10.1016/j.omtn.2017.03.006. Epub 2017 Mar 29.
6
iPGK-PseAAC: Identify Lysine Phosphoglycerylation Sites in Proteins by Incorporating Four Different Tiers of Amino Acid Pairwise Coupling Information into the General PseAAC.iPGK-PseAAC:通过将四种不同层次的氨基酸成对耦合信息整合到通用伪氨基酸组成中识别蛋白质中的赖氨酸磷酸甘油化位点。
Med Chem. 2017;13(6):552-559. doi: 10.2174/1573406413666170515120507.
7
iPreny-PseAAC: Identify C-terminal Cysteine Prenylation Sites in Proteins by Incorporating Two Tiers of Sequence Couplings into PseAAC.iPreny-PseAAC:通过将两层序列耦合纳入伪氨基酸组成来识别蛋白质中的C端半胱氨酸异戊二烯化位点
Med Chem. 2017;13(6):544-551. doi: 10.2174/1573406413666170419150052.
8
An Unprecedented Revolution in Medicinal Chemistry Driven by the Progress of Biological Science.由生物科学进展驱动的药物化学领域前所未有的革命。
Curr Top Med Chem. 2017;17(21):2337-2358. doi: 10.2174/1568026617666170414145508.
9
iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals.iATC-mISF:一种用于预测解剖治疗化学物质类别的多标签分类器。
Bioinformatics. 2017 Feb 1;33(3):341-346. doi: 10.1093/bioinformatics/btw644.
10
iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences.iRNA-AI:识别RNA序列中腺苷到肌苷的编辑位点。
Oncotarget. 2017 Jan 17;8(3):4208-4217. doi: 10.18632/oncotarget.13758.