蛋白质空间中的非线性限制了信息学在蛋白质生物物理学中的应用。

Nonlinearities in protein space limit the utility of informatics in protein biophysics.

作者信息

Rackovsky S

机构信息

Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York, 14853.

Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, New York, New York, 10029.

出版信息

Proteins. 2015 Nov;83(11):1923-8. doi: 10.1002/prot.24916. Epub 2015 Sep 10.

DOI:10.1002/prot.24916

PMID:26315852

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4609284/

Abstract

We examine the utility of informatic-based methods in computational protein biophysics. To do so, we use newly developed metric functions to define completely independent sequence and structure spaces for a large database of proteins. By investigating the relationship between these spaces, we demonstrate quantitatively the limits of knowledge-based correlation between the sequences and structures of proteins. It is shown that there are well-defined, nonlinear regions of protein space in which dissimilar structures map onto similar sequences (the conformational switch), and dissimilar sequences map onto similar structures (remote homology). These nonlinearities are shown to be quite common-almost half the proteins in our database fall into one or the other of these two regions. They are not anomalies, but rather intrinsic properties of structural encoding in amino acid sequences. It follows that extreme care must be exercised in using bioinformatic data as a basis for computational structure prediction. The implications of these results for protein evolution are examined.

摘要

我们研究了基于信息学的方法在计算蛋白质生物物理学中的效用。为此，我们使用新开发的度量函数为一个大型蛋白质数据库定义完全独立的序列和结构空间。通过研究这些空间之间的关系，我们定量地证明了基于知识的蛋白质序列与结构之间相关性的局限性。结果表明，蛋白质空间中存在明确的非线性区域，其中不同的结构映射到相似的序列（构象转换），不同的序列映射到相似的结构（远程同源性）。这些非线性现象非常普遍——我们数据库中几乎一半的蛋白质属于这两个区域中的一个或另一个。它们不是异常现象，而是氨基酸序列中结构编码的固有属性。因此，在使用生物信息学数据作为计算结构预测的基础时必须格外小心。我们还研究了这些结果对蛋白质进化的影响。

相似文献

Nonlinearities in protein space limit the utility of informatics in protein biophysics.蛋白质空间中的非线性限制了信息学在蛋白质生物物理学中的应用。

Proteins. 2015 Nov;83(11):1923-8. doi: 10.1002/prot.24916. Epub 2015 Sep 10.

Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability.通过类似蛋白质的人工序列填补蛋白质序列空间中的空白和稀疏区域，可以显著提高远程同源检测能力。

J Mol Biol. 2014 Feb 20;426(4):962-79. doi: 10.1016/j.jmb.2013.11.026. Epub 2013 Dec 4.

Computing motif correlations in proteins.计算蛋白质中的基序相关性。

J Comput Chem. 2003 Dec;24(16):2032-43. doi: 10.1002/jcc.10332.

Sequence-based protein structure prediction using a reduced state-space hidden Markov model.使用简化状态空间隐马尔可夫模型进行基于序列的蛋白质结构预测。

Comput Biol Med. 2007 Sep;37(9):1211-24. doi: 10.1016/j.compbiomed.2006.10.014. Epub 2006 Dec 11.

Beyond the Twilight Zone: automated prediction of structural properties of proteins by recursive neural networks and remote homology information.超越模糊地带：利用递归神经网络和远程同源信息自动预测蛋白质的结构特性

Proteins. 2009 Oct;77(1):181-90. doi: 10.1002/prot.22429.

Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins.蛋白质序列空间中的级联搜索：天然蛋白质间远程同源性检测中人工序列的应用

Mol Biosyst. 2012 Aug;8(8):2076-84. doi: 10.1039/c2mb25113b. Epub 2012 Jun 13.

DBAli tools: mining the protein structure space.DBAli工具：挖掘蛋白质结构空间

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W393-7. doi: 10.1093/nar/gkm236. Epub 2007 May 3.

Protein sequence randomness and sequence/structure correlations.蛋白质序列随机性与序列/结构相关性。

Biophys J. 1995 Apr;68(4):1531-9. doi: 10.1016/S0006-3495(95)80325-5.

Structural analysis of the PsbQ protein of photosystem II by Fourier transform infrared and circular dichroic spectroscopy and by bioinformatic methods.通过傅里叶变换红外光谱和圆二色光谱以及生物信息学方法对光系统II的PsbQ蛋白进行结构分析。

Biochemistry. 2003 Feb 4;42(4):1000-7. doi: 10.1021/bi026575l.

Sequence determinants of protein architecture.蛋白质结构的序列决定因素。

Proteins. 2013 Oct;81(10):1681-5. doi: 10.1002/prot.24328. Epub 2013 Aug 13.

引用本文的文献

Application of artificial intelligence and machine learning techniques to the analysis of dynamic protein sequences.人工智能和机器学习技术在动态蛋白质序列分析中的应用。

Proteins. 2024 Oct;92(10):1234-1241. doi: 10.1002/prot.26704. Epub 2024 May 29.

Design and characterization of a protein fold switching network.设计和表征蛋白质折叠开关网络。

Nat Commun. 2023 Jan 26;14(1):431. doi: 10.1038/s41467-023-36065-3.

Reduced alphabet of prebiotic amino acids optimally encodes the conformational space of diverse extant protein folds.简化的前生物氨基酸字母表最优地编码了不同现存蛋白质折叠的构象空间。

BMC Evol Biol. 2019 Jul 30;19(1):158. doi: 10.1186/s12862-019-1464-6.

Homology modeling in a dynamical world.动态世界中的同源建模。

Protein Sci. 2017 Nov;26(11):2195-2206. doi: 10.1002/pro.3274. Epub 2017 Sep 28.

Global informatics and physical property selection in protein sequences.蛋白质序列中的全局信息学与物理性质选择

Proc Natl Acad Sci U S A. 2016 Feb 16;113(7):1808-10. doi: 10.1073/pnas.1525745113. Epub 2016 Feb 1.

本文引用的文献

Global view of the protein universe.蛋白质宇宙的全球视角。

Proc Natl Acad Sci U S A. 2014 Aug 12;111(32):11691-6. doi: 10.1073/pnas.1403395111. Epub 2014 Jul 28.

Homolog detection using global sequence properties suggests an alternate view of structural encoding in protein sequences.利用全局序列特性进行同源检测，为蛋白质序列的结构编码提供了另一种观点。

Proc Natl Acad Sci U S A. 2014 Apr 8;111(14):5225-9. doi: 10.1073/pnas.1403599111. Epub 2014 Mar 24.

Sequence determinants of protein architecture.蛋白质结构的序列决定因素。

Proteins. 2013 Oct;81(10):1681-5. doi: 10.1002/prot.24328. Epub 2013 Aug 13.

Interplay of physics and evolution in the likely origin of protein biochemical function.物理与进化在蛋白质生化功能起源中的相互作用。

Proc Natl Acad Sci U S A. 2013 Jun 4;110(23):9344-9. doi: 10.1073/pnas.1300011110. Epub 2013 May 20.

On the universe of protein folds.蛋白质折叠的宇宙。

Annu Rev Biophys. 2013;42:559-82. doi: 10.1146/annurev-biophys-083012-130432. Epub 2013 Mar 20.

Spectral analysis of a protein conformational switch.蛋白质构象开关的光谱分析。

Phys Rev Lett. 2011 Jun 17;106(24):248101. doi: 10.1103/PhysRevLett.106.248101. Epub 2011 Jun 14.

Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates.通过预测查询的一维结构特性与模板的相应天然特性之间的基于概率的匹配，提高蛋白质折叠识别和基于模板的建模。

Bioinformatics. 2011 Aug 1;27(15):2076-82. doi: 10.1093/bioinformatics/btr350. Epub 2011 Jun 11.

On the evolutionary origins of "Fold Space Continuity": a study of topological convergence and divergence in mixed alpha-beta domains.论“折叠空间连续性”的进化起源：混合 α-β 域中拓扑收敛和发散的研究。

J Struct Biol. 2010 Dec;172(3):244-52. doi: 10.1016/j.jsb.2010.07.016. Epub 2010 Aug 5.

Global characteristics of protein sequences and their implications.蛋白质序列的全局特征及其意义。

Proc Natl Acad Sci U S A. 2010 May 11;107(19):8623-6. doi: 10.1073/pnas.1001299107. Epub 2010 Apr 26.

FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately.FragBag 是一种准确表示蛋白质结构的方法，它可以快速准确地从整个 PDB 中检索结构邻居。

Proc Natl Acad Sci U S A. 2010 Feb 23;107(8):3481-6. doi: 10.1073/pnas.0914097107. Epub 2010 Feb 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。