Zou Taisong, Woodrum Brian W, Halloran Nicholas, Campitelli Paul, Bobkov Andrey A, Ghirlanda Giovanna, Ozkan Sefika Banu
Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona 85287, United States.
School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States.
J Phys Chem B. 2021 Mar 18;125(10):2617-2626. doi: 10.1021/acs.jpcb.1c00364. Epub 2021 Mar 9.
Earlier experiments suggest that the evolutionary information (conservation and coevolution) encoded in protein sequences is necessary and sufficient to specify the fold of a protein family. However, there is no computational work to quantify the effect of such evolutionary information on the folding process. Here we explore the role of early folding steps for sequences designed using coevolution and conservation through a combination of computational and experimental methods. We simulated a repertoire of native and designed WW domain sequences to analyze early local contact formation and found that the N-terminal β-hairpin turn would not form correctly due to strong non-native local contacts in unfoldable sequences. Through a maximum likelihood approach, we identified five local contacts that play a critical role in folding, suggesting that a small subset of amino acid pairs can be used to solve the "needle in the haystack" problem to design foldable sequences. Thus, using the contact probability of those five local contacts that form during the early stage of folding, we built a classification model that predicts the foldability of a WW sequence with 81% accuracy. This classification model was used to redesign WW domain sequences that could not fold due to frustration and make them foldable by introducing a few mutations that led to the stabilization of these critical local contacts. The experimental analysis shows that a redesigned sequence folds and binds to polyproline peptides with a similar affinity as those observed for native WW domains. Overall, our analysis shows that evolutionary-designed sequences should not only satisfy the folding stability but also ensure a minimally frustrated folding landscape.
早期实验表明,蛋白质序列中编码的进化信息(保守性和共进化)对于确定蛋白质家族的折叠结构既是必要的也是充分的。然而,目前尚无计算工作来量化此类进化信息对折叠过程的影响。在此,我们通过计算和实验方法相结合,探索早期折叠步骤在利用共进化和保守性设计的序列中的作用。我们模拟了一系列天然和设计的WW结构域序列,以分析早期局部接触的形成,发现由于不可折叠序列中存在强烈的非天然局部接触,N端β-发夹转角无法正确形成。通过最大似然法,我们确定了五个在折叠过程中起关键作用的局部接触,这表明一小部分氨基酸对可用于解决“大海捞针”问题以设计可折叠序列。因此,利用折叠早期形成的这五个局部接触的接触概率,我们构建了一个分类模型,该模型预测WW序列可折叠性的准确率为81%。这个分类模型被用于重新设计因受阻而无法折叠的WW结构域序列,并通过引入一些导致这些关键局部接触稳定的突变使其可折叠。实验分析表明,重新设计的序列能够折叠,并以与天然WW结构域相似的亲和力结合多聚脯氨酸肽。总体而言,我们的分析表明,进化设计的序列不仅应满足折叠稳定性,还应确保折叠过程中最小程度的受阻。