Suppr超能文献

通过复合物结构预测来预测DNA结合蛋白和结合残基并应用于人类蛋白质组

Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome.

作者信息

Zhao Huiying, Wang Jihua, Zhou Yaoqi, Yang Yuedong

机构信息

School of Informatics, Indiana University Purdue University, Indianapolis, Indiana, United States of America; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America; QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia.

Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America; Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Dezhou University, Dezhou, Shandong, China.

出版信息

PLoS One. 2014 May 2;9(5):e96694. doi: 10.1371/journal.pone.0096694. eCollection 2014.

Abstract

As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions). A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC) of 0.77 with high precision (94%) and high sensitivity (65%). We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA)] is available as an on-line server at http://sparks-lab.org.

摘要

随着越来越多的蛋白质序列通过日益廉价的测序技术被发现,一项紧迫的任务是确定它们的功能。这项工作提出了一种高度可靠的计算技术,用于在蛋白质 - DNA 复合物结构水平上预测 DNA 结合功能,而不是像大多数现有技术那样进行低分辨率的 DNA 结合二态预测。该方法首先利用基于模板的结构预测技术 HHblits 预测蛋白质 - DNA 复合物结构,然后基于基于知识的能量函数(蛋白质 - DNA 相互作用的距离缩放有限理想气体参考状态)进行结合亲和力预测。基于 179 个 DNA 结合和 3797 个非结合蛋白结构域对该方法进行留一法交叉验证,得到马修斯相关系数(MCC)为 0.77,具有高精度(94%)和高灵敏度(65%)。我们进一步发现,对于 82 个新确定的 DNA 结合蛋白结构,灵敏度为 51%,对于人类蛋白质组,灵敏度为 56%。此外,该方法基于预测的 DNA 结合复合物结构,对蛋白质中的 DNA 结合残基提供了合理准确的预测。将其应用于人类蛋白质组,发现了 300 多种新型 DNA 结合蛋白;其中一些预测结构通过 APO 形式同源蛋白的已知结构得到了验证。该方法 [SPOT-Seq (DNA)] 可在 http://sparks-lab.org 作为在线服务器使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1207/4008587/4e9ede3fc167/pone.0096694.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验