• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DNABind:一种基于机器学习和模板的混合算法,用于预测基于结构的 DNA 结合残基。

DNABind: a hybrid algorithm for structure-based prediction of DNA-binding residues by combining machine learning- and template-based approaches.

机构信息

Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, 29208; Center for Bioinformatics, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China.

出版信息

Proteins. 2013 Nov;81(11):1885-99. doi: 10.1002/prot.24330. Epub 2013 Aug 16.

DOI:10.1002/prot.24330
PMID:23737141
Abstract

Accurate prediction of DNA-binding residues has become a problem of increasing importance in structural bioinformatics. Here, we presented DNABind, a novel hybrid algorithm for identifying these crucial residues by exploiting the complementarity between machine learning- and template-based methods. Our machine learning-based method was based on the probabilistic combination of a structure-based and a sequence-based predictor, both of which were implemented using support vector machines algorithms. The former included our well-designed structural features, such as solvent accessibility, local geometry, topological features, and relative positions, which can effectively quantify the difference between DNA-binding and nonbinding residues. The latter combined evolutionary conservation features with three other sequence attributes. Our template-based method depended on structural alignment and utilized the template structure from known protein-DNA complexes to infer DNA-binding residues. We showed that the template method had excellent performance when reliable templates were found for the query proteins but tended to be strongly influenced by the template quality as well as the conformational changes upon DNA binding. In contrast, the machine learning approach yielded better performance when high-quality templates were not available (about 1/3 cases in our dataset) or the query protein was subject to intensive transformation changes upon DNA binding. Our extensive experiments indicated that the hybrid approach can distinctly improve the performance of the individual methods for both bound and unbound structures. DNABind also significantly outperformed the state-of-art algorithms by around 10% in terms of Matthews's correlation coefficient. The proposed methodology could also have wide application in various protein functional site annotations. DNABind is freely available at http://mleg.cse.sc.edu/DNABind/.

摘要

准确预测 DNA 结合残基已成为结构生物信息学中日益重要的问题。在这里,我们提出了 DNABind,这是一种通过利用基于机器学习和基于模板的方法之间的互补性来识别这些关键残基的新型混合算法。我们的基于机器学习的方法基于基于结构和基于序列的预测器的概率组合,这两种预测器都使用支持向量机算法实现。前者包括我们精心设计的结构特征,如溶剂可及性、局部几何形状、拓扑特征和相对位置,这些特征可以有效地量化 DNA 结合和非结合残基之间的差异。后者将进化保守特征与其他三个序列属性相结合。我们的基于模板的方法依赖于结构比对,并利用来自已知蛋白-DNA 复合物的模板结构来推断 DNA 结合残基。我们表明,当为查询蛋白找到可靠的模板时,模板方法具有出色的性能,但容易受到模板质量以及 DNA 结合时的构象变化的强烈影响。相比之下,当高质量的模板不可用时(在我们的数据集大约 1/3 的情况下)或查询蛋白在 DNA 结合时受到强烈的变形变化时,基于机器学习的方法产生更好的性能。我们的广泛实验表明,混合方法可以明显提高个体方法在结合和未结合结构上的性能。DNABind 在 Matthews 相关系数方面也比最先进的算法提高了约 10%。所提出的方法也可以在各种蛋白功能位点注释中广泛应用。DNABind 可在 http://mleg.cse.sc.edu/DNABind/ 上免费获得。

相似文献

1
DNABind: a hybrid algorithm for structure-based prediction of DNA-binding residues by combining machine learning- and template-based approaches.DNABind:一种基于机器学习和模板的混合算法,用于预测基于结构的 DNA 结合残基。
Proteins. 2013 Nov;81(11):1885-99. doi: 10.1002/prot.24330. Epub 2013 Aug 16.
2
RBRDetector: improved prediction of binding residues on RNA-binding protein structures using complementary feature- and template-based strategies.RBRDetector:利用基于互补特征和模板的策略改进对RNA结合蛋白结构上结合残基的预测。
Proteins. 2014 Oct;82(10):2455-71. doi: 10.1002/prot.24610. Epub 2014 Jun 9.
3
SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.SNBRFinder:一种基于序列的混合算法,用于增强对核酸结合残基的预测。
PLoS One. 2015 Jul 15;10(7):e0133260. doi: 10.1371/journal.pone.0133260. eCollection 2015.
4
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学,使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应
Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.
5
A machine learning information retrieval approach to protein fold recognition.一种用于蛋白质折叠识别的机器学习信息检索方法。
Bioinformatics. 2006 Jun 15;22(12):1456-63. doi: 10.1093/bioinformatics/btl102. Epub 2006 Mar 17.
6
HemeBIND: a novel method for heme binding residue prediction by combining structural and sequence information.HemeBIND:一种通过结合结构和序列信息预测血红素结合残基的新方法。
BMC Bioinformatics. 2011 May 26;12:207. doi: 10.1186/1471-2105-12-207.
7
Prediction-based fingerprints of protein-protein interactions.基于预测的蛋白质-蛋白质相互作用指纹图谱。
Proteins. 2007 Feb 15;66(3):630-45. doi: 10.1002/prot.21248.
8
Protein backbone angle prediction with machine learning approaches.基于机器学习方法的蛋白质主链角度预测
Bioinformatics. 2004 Jul 10;20(10):1612-21. doi: 10.1093/bioinformatics/bth136. Epub 2004 Feb 26.
9
Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine-Learning approach for feature weighting.多RELIEF:一种使用机器学习方法进行特征加权,从多序列比对中识别特异性决定残基的方法。
Bioinformatics. 2008 Jan 1;24(1):18-25. doi: 10.1093/bioinformatics/btm537. Epub 2007 Nov 17.
10
Prediction of protein-protein interaction sites from weakly homologous template structures using meta-threading and machine learning.利用元线程和机器学习从弱同源模板结构预测蛋白质-蛋白质相互作用位点。
J Mol Recognit. 2015 Jan;28(1):35-48. doi: 10.1002/jmr.2410.

引用本文的文献

1
Predicting nucleic acid binding sites by attention map-guided graph convolutional network with protein language embeddings and physicochemical information.利用注意力图引导的图卷积网络结合蛋白质语言嵌入和物理化学信息预测核酸结合位点。
Brief Bioinform. 2025 Aug 31;26(5). doi: 10.1093/bib/bbaf457.
2
Advances in Language-Model-Informed Protein-Nucleic Acid Binding Site Prediction.基于语言模型的蛋白质-核酸结合位点预测研究进展
Methods Mol Biol. 2025;2941:139-151. doi: 10.1007/978-1-0716-4623-6_9.
3
TransBind allows precise detection of DNA-binding proteins and residues using language models and deep learning.
TransBind可利用语言模型和深度学习精确检测DNA结合蛋白和残基。
Commun Biol. 2025 Apr 5;8(1):568. doi: 10.1038/s42003-025-07534-w.
4
Centromeric localization of αKNL2 and CENP-C proteins in plants depends on their centromere-targeting domain and DNA-binding regions.植物中αKNL2和CENP-C蛋白的着丝粒定位取决于它们的着丝粒靶向结构域和DNA结合区域。
Nucleic Acids Res. 2025 Feb 8;53(4). doi: 10.1093/nar/gkae1242.
5
Advances in the Application of Protein Language Modeling for Nucleic Acid Protein Binding Site Prediction.蛋白质语言模型在核酸蛋白质结合位点预测中的应用进展。
Genes (Basel). 2024 Aug 18;15(8):1090. doi: 10.3390/genes15081090.
6
EGPDI: identifying protein-DNA binding sites based on multi-view graph embedding fusion.EGPDI:基于多视图图嵌入融合的蛋白质-DNA 结合位点识别。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae330.
7
SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues.SOFB 是一种全面的集成深度学习方法,用于阐明和描述蛋白质-核酸结合残基。
Commun Biol. 2024 Jun 3;7(1):679. doi: 10.1038/s42003-024-06332-0.
8
A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond.蛋白质中心预测因子在生物分子相互作用研究中的综合综述:从蛋白质到核酸及其他。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae162.
9
EPDRNA: A Model for Identifying DNA-RNA Binding Sites in Disease-Related Proteins.EPDRNA:一种用于识别疾病相关蛋白质中DNA-RNA结合位点的模型。
Protein J. 2024 Jun;43(3):513-521. doi: 10.1007/s10930-024-10183-3. Epub 2024 Mar 16.
10
ULDNA: integrating unsupervised multi-source language models with LSTM-attention network for high-accuracy protein-DNA binding site prediction.ULDNA:将无监督多源语言模型与 LSTM-注意力网络集成,以实现高精度的蛋白质-DNA 结合位点预测。
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae040.