School of Informatics, Indiana University Purdue University, Indianapolis, Indiana 46202, USA.
Proteins. 2010 Aug 1;78(10):2338-48. doi: 10.1002/prot.22746.
Designing a protein sequence that will fold into a predefined structure is of both practical and fundamental interest. Many successful, computational designs in the last decade resulted from improved understanding of hydrophobic and polar interactions between side chains of amino acid residues in stabilizing protein tertiary structures. However, the coupling between main-chain backbone structure and local sequence has yet to be fully addressed. Here, we attempt to account for such coupling by using a sequence profile derived from the sequences of five residue fragments in a fragment library that are structurally matched to the five-residue segments contained in a target structure. We further introduced a term to reduce low complexity regions of designed sequences. These two terms together with optimized reference states for amino-acid residues were implemented in the RosettaDesign program. The new method, called RosettaDesign-SR, makes a 12% increase (from 34 to 46%) in fraction of proteins whose designed sequences are more than 35% identical to wild-type sequences. Meanwhile, it reduces 8% (from 22% to 14%) to the number of designed sequences that are not homologous to any known protein sequences according to psi-blast. More importantly, the sequences designed by RosettaDesign-SR have 2-3% more polar residues at the surface and core regions of proteins and these surface and core polar residues have about 4% higher sequence identity to wild-type sequences than by RosettaDesign. Thus, the proteins designed by RosettaDesign-SR should be less likely to aggregate and more likely to have unique structures due to more specific polar interactions.
设计能够折叠成预定结构的蛋白质序列具有实际和基础的双重意义。在过去十年中,许多成功的计算设计都源于对稳定蛋白质三级结构的氨基酸残基侧链之间的疏水和极性相互作用的理解的提高。然而,主链骨架结构与局部序列之间的耦合尚未得到充分解决。在这里,我们尝试通过使用源自结构匹配到目标结构中包含的五残基片段的五残基片段文库的序列轮廓来解决这种耦合。我们进一步引入了一个术语来减少设计序列的低复杂度区域。这两个术语以及氨基酸残基的优化参考状态在 RosettaDesign 程序中实现。该新方法称为 RosettaDesign-SR,使设计序列与野生型序列的相似度超过 35%的蛋白质比例增加了 12%(从 34%增加到 46%)。同时,根据 psi-blast,与任何已知蛋白质序列没有同源性的设计序列的数量减少了 8%(从 22%减少到 14%)。更重要的是,RosettaDesign-SR 设计的序列在蛋白质的表面和核心区域具有 2-3%更多的极性残基,这些表面和核心极性残基与野生型序列的序列同一性比 RosettaDesign 高约 4%。因此,由于更特定的极性相互作用,RosettaDesign-SR 设计的蛋白质不太可能聚集,并且更有可能具有独特的结构。