通过复合物结构预测来预测DNA结合蛋白和结合残基并应用于人类蛋白质组

Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome.

作者信息

Zhao Huiying, Wang Jihua, Zhou Yaoqi, Yang Yuedong

机构信息

School of Informatics, Indiana University Purdue University, Indianapolis, Indiana, United States of America; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America; QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia.

Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America; Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Dezhou University, Dezhou, Shandong, China.

出版信息

PLoS One. 2014 May 2;9(5):e96694. doi: 10.1371/journal.pone.0096694. eCollection 2014.

DOI:10.1371/journal.pone.0096694

PMID:24792350

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4008587/

Abstract

As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions). A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC) of 0.77 with high precision (94%) and high sensitivity (65%). We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA)] is available as an on-line server at http://sparks-lab.org.

摘要

随着越来越多的蛋白质序列通过日益廉价的测序技术被发现，一项紧迫的任务是确定它们的功能。这项工作提出了一种高度可靠的计算技术，用于在蛋白质 - DNA 复合物结构水平上预测 DNA 结合功能，而不是像大多数现有技术那样进行低分辨率的 DNA 结合二态预测。该方法首先利用基于模板的结构预测技术 HHblits 预测蛋白质 - DNA 复合物结构，然后基于基于知识的能量函数（蛋白质 - DNA 相互作用的距离缩放有限理想气体参考状态）进行结合亲和力预测。基于 179 个 DNA 结合和 3797 个非结合蛋白结构域对该方法进行留一法交叉验证，得到马修斯相关系数（MCC）为 0.77，具有高精度（94%）和高灵敏度（65%）。我们进一步发现，对于 82 个新确定的 DNA 结合蛋白结构，灵敏度为 51%，对于人类蛋白质组，灵敏度为 56%。此外，该方法基于预测的 DNA 结合复合物结构，对蛋白质中的 DNA 结合残基提供了合理准确的预测。将其应用于人类蛋白质组，发现了 300 多种新型 DNA 结合蛋白；其中一些预测结构通过 APO 形式同源蛋白的已知结构得到了验证。该方法 [SPOT-Seq (DNA)] 可在 http://sparks-lab.org 作为在线服务器使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1207/4008587/4e9ede3fc167/pone.0096694.g001.jpg

相似文献

Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome.

PLoS One. 2014 May 2;9(5):e96694. doi: 10.1371/journal.pone.0096694. eCollection 2014.

Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function.

Bioinformatics. 2010 Aug 1;26(15):1857-63. doi: 10.1093/bioinformatics/btq295. Epub 2010 Jun 4.

Prediction and validation of the unexplored RNA-binding protein atlas of the human proteome.

Proteins. 2014 Apr;82(4):640-7. doi: 10.1002/prot.24441. Epub 2013 Nov 22.

Carbohydrate-binding protein identification by coupling structural similarity searching with binding affinity prediction.

J Comput Chem. 2014 Nov 15;35(30):2177-83. doi: 10.1002/jcc.23730. Epub 2014 Sep 15.

Highly accurate and high-resolution function prediction of RNA binding proteins by fold recognition and binding affinity prediction.

RNA Biol. 2011 Nov-Dec;8(6):988-96. doi: 10.4161/rna.8.6.17813. Epub 2011 Nov 1.

Structure-based prediction of protein- peptide binding regions using Random Forest.

Bioinformatics. 2018 Feb 1;34(3):477-484. doi: 10.1093/bioinformatics/btx614.

SPOT-Seq-RNA: predicting protein-RNA complex structure and RNA-binding function by fold recognition and binding affinity prediction.

Methods Mol Biol. 2014;1137:119-30. doi: 10.1007/978-1-4939-0366-5_9.

SPOT-Peptide: Template-Based Prediction of Peptide-Binding Proteins and Peptide-Binding Sites.

J Chem Inf Model. 2019 Feb 25;59(2):924-930. doi: 10.1021/acs.jcim.8b00777. Epub 2019 Feb 14.

Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information.

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1766-75. doi: 10.1109/TCBB.2012.106.

Predicting target DNA sequences of DNA-binding proteins based on unbound structures.

PLoS One. 2012;7(2):e30446. doi: 10.1371/journal.pone.0030446. Epub 2012 Feb 1.

引用本文的文献

Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences.

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf016.

A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond.

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae162.

Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features.

Sci Rep. 2024 Feb 5;14(1):2961. doi: 10.1038/s41598-024-52653-9.

DBP-iDWT: Improving DNA-Binding Proteins Prediction Using Multi-Perspective Evolutionary Profile and Discrete Wavelet Transform.

Comput Intell Neurosci. 2022 Sep 28;2022:2987407. doi: 10.1155/2022/2987407. eCollection 2022.

DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information.

J Comput Aided Mol Des. 2019 Jul;33(7):645-658. doi: 10.1007/s10822-019-00207-x. Epub 2019 May 23.

Prediction of RNA- and DNA-Binding Proteins Using Various Machine Learning Classifiers.

Avicenna J Med Biotechnol. 2019 Jan-Mar;11(1):104-111.

Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism.

Nucleic Acids Res. 2018 Jan 9;46(1):54-70. doi: 10.1093/nar/gkx1166.

HOMCOS: an updated server to search and model complex 3D structures.

J Struct Funct Genomics. 2016 Dec;17(4):83-99. doi: 10.1007/s10969-016-9208-y. Epub 2016 Aug 13.

SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

PLoS One. 2015 Jul 15;10(7):e0133260. doi: 10.1371/journal.pone.0133260. eCollection 2015.

Template-based prediction of protein function.

Curr Opin Struct Biol. 2015 Jun;32:33-8. doi: 10.1016/j.sbi.2015.01.007. Epub 2015 Feb 10.

本文引用的文献

Prediction and validation of the unexplored RNA-binding protein atlas of the human proteome.

Proteins. 2014 Apr;82(4):640-7. doi: 10.1002/prot.24441. Epub 2013 Nov 22.

DR_bind: a web server for predicting DNA-binding residues from the protein structure based on electrostatics, evolution and geometry.

Nucleic Acids Res. 2012 Jul;40(Web Server issue):W249-56. doi: 10.1093/nar/gks481. Epub 2012 May 31.

Characterization and prediction of the binding site in DNA-binding proteins: improvement of accuracy by combining residue composition, evolutionary conservation and structural parameters.

Nucleic Acids Res. 2012 Aug;40(15):7150-61. doi: 10.1093/nar/gks405. Epub 2012 May 27.

A new size-independent score for pairwise protein structure alignment and its application to structure classification and nucleic-acid binding prediction.

Proteins. 2012 Aug;80(8):2080-8. doi: 10.1002/prot.24100. Epub 2012 May 25.

HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.

Nat Methods. 2011 Dec 25;9(2):173-5. doi: 10.1038/nmeth.1818.

Exploiting a reduced set of weighted average features to improve prediction of DNA-binding residues from 3D structures.

PLoS One. 2011;6(12):e28440. doi: 10.1371/journal.pone.0028440. Epub 2011 Dec 8.

Prediction of DNA-binding protein based on statistical and geometric features and support vector machines.

Proteome Sci. 2011 Oct 14;9 Suppl 1(Suppl 1):S1. doi: 10.1186/1477-5956-9-S1-S1.

Assessment of template based protein structure predictions in CASP9.

Proteins. 2011;79 Suppl 10:37-58. doi: 10.1002/prot.23177. Epub 2011 Oct 15.

Highly accurate and high-resolution function prediction of RNA binding proteins by fold recognition and binding affinity prediction.

RNA Biol. 2011 Nov-Dec;8(6):988-96. doi: 10.4161/rna.8.6.17813. Epub 2011 Nov 1.

iDNA-Prot: identification of DNA binding proteins using random forest with grey model.

PLoS One. 2011;6(9):e24756. doi: 10.1371/journal.pone.0024756. Epub 2011 Sep 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过复合物结构预测来预测DNA结合蛋白和结合残基并应用于人类蛋白质组

Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献