利用结构衍生序列轮廓提高计算蛋白质设计。

Improving computational protein design by using structure-derived sequence profile.

机构信息

School of Informatics, Indiana University Purdue University, Indianapolis, Indiana 46202, USA.

出版信息

Proteins. 2010 Aug 1;78(10):2338-48. doi: 10.1002/prot.22746.

DOI:10.1002/prot.22746

PMID:20544969

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3058783/

Abstract

Designing a protein sequence that will fold into a predefined structure is of both practical and fundamental interest. Many successful, computational designs in the last decade resulted from improved understanding of hydrophobic and polar interactions between side chains of amino acid residues in stabilizing protein tertiary structures. However, the coupling between main-chain backbone structure and local sequence has yet to be fully addressed. Here, we attempt to account for such coupling by using a sequence profile derived from the sequences of five residue fragments in a fragment library that are structurally matched to the five-residue segments contained in a target structure. We further introduced a term to reduce low complexity regions of designed sequences. These two terms together with optimized reference states for amino-acid residues were implemented in the RosettaDesign program. The new method, called RosettaDesign-SR, makes a 12% increase (from 34 to 46%) in fraction of proteins whose designed sequences are more than 35% identical to wild-type sequences. Meanwhile, it reduces 8% (from 22% to 14%) to the number of designed sequences that are not homologous to any known protein sequences according to psi-blast. More importantly, the sequences designed by RosettaDesign-SR have 2-3% more polar residues at the surface and core regions of proteins and these surface and core polar residues have about 4% higher sequence identity to wild-type sequences than by RosettaDesign. Thus, the proteins designed by RosettaDesign-SR should be less likely to aggregate and more likely to have unique structures due to more specific polar interactions.

摘要

设计能够折叠成预定结构的蛋白质序列具有实际和基础的双重意义。在过去十年中，许多成功的计算设计都源于对稳定蛋白质三级结构的氨基酸残基侧链之间的疏水和极性相互作用的理解的提高。然而，主链骨架结构与局部序列之间的耦合尚未得到充分解决。在这里，我们尝试通过使用源自结构匹配到目标结构中包含的五残基片段的五残基片段文库的序列轮廓来解决这种耦合。我们进一步引入了一个术语来减少设计序列的低复杂度区域。这两个术语以及氨基酸残基的优化参考状态在 RosettaDesign 程序中实现。该新方法称为 RosettaDesign-SR，使设计序列与野生型序列的相似度超过 35%的蛋白质比例增加了 12%（从 34%增加到 46%）。同时，根据 psi-blast，与任何已知蛋白质序列没有同源性的设计序列的数量减少了 8%（从 22%减少到 14%）。更重要的是，RosettaDesign-SR 设计的序列在蛋白质的表面和核心区域具有 2-3%更多的极性残基，这些表面和核心极性残基与野生型序列的序列同一性比 RosettaDesign 高约 4%。因此，由于更特定的极性相互作用，RosettaDesign-SR 设计的蛋白质不太可能聚集，并且更有可能具有独特的结构。

相似文献

Improving computational protein design by using structure-derived sequence profile.利用结构衍生序列轮廓提高计算蛋白质设计。

Proteins. 2010 Aug 1;78(10):2338-48. doi: 10.1002/prot.22746.

Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles.通过具有基于片段的局部特征和基于能量的非局部特征的神经网络直接预测与蛋白质结构兼容的序列特征。

Proteins. 2014 Oct;82(10):2565-73. doi: 10.1002/prot.24620. Epub 2014 Jun 19.

Identification of amino acids involved in protein structural uniqueness: implication for de novo protein design.鉴定参与蛋白质结构独特性的氨基酸：对从头蛋白质设计的启示。

Protein Eng. 2002 Jul;15(7):555-60. doi: 10.1093/protein/15.7.555.

Solution structure of a de novo protein from a designed combinatorial library.来自设计组合文库的全新蛋白质的溶液结构

Proc Natl Acad Sci U S A. 2003 Nov 11;100(23):13270-3. doi: 10.1073/pnas.1835644100. Epub 2003 Oct 30.

Local descriptors of protein structure: a systematic analysis of the sequence-structure relationship in proteins using short- and long-range interactions.蛋白质结构的局部描述符：利用短程和长程相互作用对蛋白质序列-结构关系进行系统分析。

Proteins. 2009 Jun;75(4):870-84. doi: 10.1002/prot.22296.

Protein Folding Prediction in a Cubic Lattice in Hydrophobic-Polar Model.疏水-极性模型中立方晶格中的蛋白质折叠预测

J Comput Biol. 2017 May;24(5):412-421. doi: 10.1089/cmb.2016.0181. Epub 2016 Nov 30.

NdPASA: a novel pairwise protein sequence alignment algorithm that incorporates neighbor-dependent amino acid propensities.NdPASA：一种整合了邻域依赖氨基酸倾向的新型双序列蛋白质序列比对算法。

Proteins. 2005 Feb 15;58(3):628-37. doi: 10.1002/prot.20359.

Perturbing the energy landscape for improved packing during computational protein design.通过计算蛋白质设计中改善堆积时的能量景观来进行干扰。

Proteins. 2021 Apr;89(4):436-449. doi: 10.1002/prot.26030. Epub 2020 Dec 11.

Correlation between sequence hydrophobicity and surface-exposure pattern of database proteins.数据库蛋白质序列疏水性与表面暴露模式之间的相关性。

Protein Sci. 2004 Mar;13(3):752-62. doi: 10.1110/ps.03431704. Epub 2004 Feb 6.

Defining the minimum size of a hydrophobic cluster in two-stranded alpha-helical coiled-coils: effects on protein stability.确定双链α-螺旋卷曲螺旋中疏水簇的最小尺寸：对蛋白质稳定性的影响。

Protein Sci. 2004 Mar;13(3):714-26. doi: 10.1110/ps.03443204.

引用本文的文献

The 3-ketoacyl-CoA thiolase: an engineered enzyme for carbon chain elongation of chemical compounds.3-酮酰基辅酶A硫解酶：一种用于化合物碳链延长的工程酶。

Appl Microbiol Biotechnol. 2020 Oct;104(19):8117-8129. doi: 10.1007/s00253-020-10848-w. Epub 2020 Aug 24.

ProDCoNN: Protein design using a convolutional neural network.ProDCoNN：使用卷积神经网络进行蛋白质设计。

Proteins. 2020 Jul;88(7):819-829. doi: 10.1002/prot.25868. Epub 2020 Jan 6.

Use of designed sequences in protein structure recognition.设计序列在蛋白质结构识别中的应用。

Biol Direct. 2018 May 9;13(1):8. doi: 10.1186/s13062-018-0209-6.

PyIgClassify: a database of antibody CDR structural classifications.PyIgClassify：一个抗体互补决定区（CDR）结构分类数据库。

Nucleic Acids Res. 2015 Jan;43(Database issue):D432-8. doi: 10.1093/nar/gku1106. Epub 2014 Nov 11.

Detecting local residue environment similarity for recognizing near-native structure models.检测局部残基环境相似性以识别近天然结构模型。

Proteins. 2014 Dec;82(12):3255-72. doi: 10.1002/prot.24658. Epub 2014 Oct 30.

Proteins. 2014 Oct;82(10):2565-73. doi: 10.1002/prot.24620. Epub 2014 Jun 19.

Applying physics-based scoring to calculate free energies of binding for single amino acid mutations in protein-protein complexes.应用基于物理学的评分方法计算蛋白质-蛋白质复合物中单个氨基酸突变的结合自由能。

PLoS One. 2013 Dec 10;8(12):e82849. doi: 10.1371/journal.pone.0082849. eCollection 2013.

DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels.DDIG-in：区分疾病相关和中性非移码微插入缺失

Genome Biol. 2013 Mar 13;14(3):R23. doi: 10.1186/gb-2013-14-3-r23.

Energy functions in de novo protein design: current challenges and future prospects.从头设计蛋白质中的能量函数：当前的挑战和未来的前景。

Annu Rev Biophys. 2013;42:315-35. doi: 10.1146/annurev-biophys-083012-130315. Epub 2013 Feb 28.

Characterizing the existing and potential structural space of proteins by large-scale multiple loop permutations.通过大规模的多重环置换来描述蛋白质的现有和潜在结构空间。

J Mol Biol. 2011 May 6;408(3):585-95. doi: 10.1016/j.jmb.2011.02.056. Epub 2011 Mar 2.

本文引用的文献

Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction.预测连续局部结构及其在无片段蛋白质结构预测中取代二级结构的效果。

Structure. 2009 Nov 11;17(11):1515-27. doi: 10.1016/j.str.2009.09.006.

Computational design of ligand binding is not a solved problem.配体结合的计算设计并非一个已解决的问题。

Proc Natl Acad Sci U S A. 2009 Nov 3;106(44):18491-6. doi: 10.1073/pnas.0907950106. Epub 2009 Oct 15.

Backbone flexibility in computational protein design.计算蛋白质设计中的骨架灵活性。

Curr Opin Biotechnol. 2009 Aug;20(4):420-8. doi: 10.1016/j.copbio.2009.07.006. Epub 2009 Aug 24.

Computational protein design as a tool for fold recognition.计算蛋白质设计作为一种折叠识别工具。

Proteins. 2009 Oct;77(1):139-58. doi: 10.1002/prot.22426.

Design of protein-interaction specificity gives selective bZIP-binding peptides.蛋白质相互作用特异性的设计产生选择性bZIP结合肽。

Nature. 2009 Apr 16;458(7240):859-64. doi: 10.1038/nature07885.

Challenges in the computational design of proteins.蛋白质计算设计中的挑战。

J R Soc Interface. 2009 Aug 6;6 Suppl 4(Suppl 4):S477-91. doi: 10.1098/rsif.2008.0508.focus. Epub 2009 Mar 11.

Computer-based redesign of a beta sandwich protein suggests that extensive negative design is not required for de novo beta sheet design.基于计算机的β折叠三明治蛋白重新设计表明，从头设计β折叠并不需要广泛的负向设计。

Structure. 2008 Dec 10;16(12):1799-805. doi: 10.1016/j.str.2008.09.013.

Computational design of calmodulin mutants with up to 900-fold increase in binding specificity.结合特异性提高多达900倍的钙调蛋白突变体的计算设计。

J Mol Biol. 2009 Feb 6;385(5):1470-80. doi: 10.1016/j.jmb.2008.09.053. Epub 2008 Sep 27.

Fragment-based local statistical potentials derived by combining an alphabet of protein local structures with secondary structures and solvent accessibilities.通过将蛋白质局部结构字母表与二级结构和溶剂可及性相结合得出的基于片段的局部统计势。

Proteins. 2009 Mar;74(4):820-36. doi: 10.1002/prot.22191.

Reconstruction of protein backbones from the BriX collection of canonical protein fragments.从标准蛋白质片段的BriX集合中重建蛋白质骨架。

PLoS Comput Biol. 2008 May 23;4(5):e1000083. doi: 10.1371/journal.pcbi.1000083.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验