Suppr超能文献

有多少蛋白质序列能折叠成给定结构?共进化分析。

How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.

作者信息

Tian Pengfei, Best Robert B

机构信息

Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland.

Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland.

出版信息

Biophys J. 2017 Oct 17;113(8):1719-1730. doi: 10.1016/j.bpj.2017.08.039.

Abstract

Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance.

摘要

量化蛋白质序列与结构之间的关系是理解蛋白质世界的关键。这种关系的一个基本衡量标准是能够折叠成目标蛋白质结构的氨基酸序列总数,即“序列容量”,它被认为是给定蛋白质折叠可设计程度的一个指标。尽管已经使用晶格模型和理论对序列容量进行了广泛研究,但目前缺乏对真实蛋白质结构的数值估计。在这项工作中,我们使用基于残基-残基协同进化的统计模型,定量估计了10种具有各种不同结构的蛋白质的序列容量,以捕捉来自同一蛋白质家族的序列变异。值得注意的是,我们发现即使对于最小的蛋白质折叠,如WW结构域,可折叠序列的数量也极其庞大,超过了阿伏伽德罗常数。与早期的理论工作一致,计算出的序列容量与蛋白质的大小,或者更好地说,与接触密度呈正相关。这使得可以根据给定蛋白质的结构大致预测其绝对序列容量。另一方面,相对序列容量,即通过可能序列总数归一化后,是一个极其微小的数字,并且与蛋白质长度呈强烈的负相关。因此,尽管较大的蛋白质可能有更多的可折叠序列,但找到它们会困难得多。最后,我们将CATH数据库中蛋白质的进化年龄与其由我们的模型预测的序列容量进行了关联。结果表明,在高可设计性和新折叠偶然出现的可能性这两个相互矛盾的要求之间存在权衡。

相似文献

3
Super folds, networks, and barriers.超级褶皱、网络和屏障。
Proteins. 2012 Feb;80(2):463-70. doi: 10.1002/prot.23212. Epub 2011 Nov 17.
4
The designability of protein structures.蛋白质结构的可设计性。
J Mol Graph Model. 2001;19(1):157-67. doi: 10.1016/s1093-3263(00)00137-6.
6
Physical origins of protein superfamilies.蛋白质超家族的物理起源
J Mol Biol. 2006 Apr 7;357(4):1335-43. doi: 10.1016/j.jmb.2006.01.081. Epub 2006 Feb 6.
9
Size and structure of the sequence space of repeat proteins.重复蛋白序列空间的大小和结构。
PLoS Comput Biol. 2019 Aug 15;15(8):e1007282. doi: 10.1371/journal.pcbi.1007282. eCollection 2019 Aug.

引用本文的文献

3
A systematic analysis of regression models for protein engineering.蛋白质工程中回归模型的系统分析。
PLoS Comput Biol. 2024 May 3;20(5):e1012061. doi: 10.1371/journal.pcbi.1012061. eCollection 2024 May.
5
Fluid protein fold space and its implications.流体蛋白质折叠空间及其意义。
Bioessays. 2023 Sep;45(9):e2300057. doi: 10.1002/bies.202300057. Epub 2023 Jul 11.

本文引用的文献

1
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.基于超深度学习模型的蛋白质接触图从头精确预测
PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.
2
UniProt: the universal protein knowledgebase.通用蛋白质知识库:UniProt
Nucleic Acids Res. 2017 Jan 4;45(D1):D158-D169. doi: 10.1093/nar/gkw1099. Epub 2016 Nov 29.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验