Graduate Program in Biological and Medical Informatics, University of California San Francisco, 600 16th Street, MC 2240, San Francisco, CA 94158, USA.
J Mol Biol. 2010 Sep 17;402(2):460-74. doi: 10.1016/j.jmb.2010.07.032. Epub 2010 Jul 21.
Protein-protein recognition, frequently mediated by members of large families of interaction domains, is one of the cornerstones of biological function. Here, we present a computational, structure-based method to predict the sequence space of peptides recognized by PDZ domains, one of the largest families of recognition proteins. As a test set, we use a considerable amount of recent phage display data that describe the peptide recognition preferences for 169 naturally occurring and engineered PDZ domains. For both wild-type PDZ domains and single point mutants, we find that 70-80% of the most frequently observed amino acids by phage display are predicted within the top five ranked amino acids. Phage display frequently identified recognition preferences for amino acids different from those present in the original crystal structure. Notably, in about half of these cases, our algorithm correctly captures these preferences, indicating that it can predict mutations that increase binding affinity relative to the starting structure. We also find that we can computationally recapitulate specificity changes upon mutation, a key test for successful forward design of protein-protein interface specificity. Across all evaluated data sets, we find that incorporation backbone sampling improves accuracy substantially, irrespective of using a crystal or NMR structure as the starting conformation. Finally, we report successful prediction of several amino acid specificity changes from blind tests in the DREAM4 peptide recognition domain specificity prediction challenge. Because the foundational methods developed here are structure based, these results suggest that the approach can be more generally applied to specificity prediction and redesign of other protein-protein interfaces that have structural information but lack phage display data.
蛋白质-蛋白质识别,通常由相互作用域大家族的成员介导,是生物功能的基石之一。在这里,我们提出了一种基于结构的计算方法,用于预测 PDZ 结构域识别的肽的序列空间,PDZ 结构域是最大的识别蛋白家族之一。作为测试集,我们使用了大量最近的噬菌体展示数据,这些数据描述了 169 个天然和工程 PDZ 结构域对肽的识别偏好。对于野生型 PDZ 结构域和单点突变体,我们发现噬菌体展示中最常观察到的 70-80%的氨基酸都可以在排名前五的氨基酸中预测到。噬菌体展示经常识别出与原始晶体结构中不同的氨基酸识别偏好。值得注意的是,在大约一半的情况下,我们的算法正确地捕捉到了这些偏好,这表明它可以预测相对于起始结构增加结合亲和力的突变。我们还发现,我们可以在计算上重现突变后的特异性变化,这是成功设计蛋白质-蛋白质界面特异性的关键测试。在所有评估的数据集中,我们发现无论使用晶体结构还是 NMR 结构作为起始构象,都可以通过纳入骨架采样来显著提高准确性。最后,我们报告了在 DREAM4 肽识别域特异性预测挑战的盲测中,对几个氨基酸特异性变化的成功预测。由于这里开发的基础方法是基于结构的,因此这些结果表明该方法可以更普遍地应用于具有结构信息但缺乏噬菌体展示数据的其他蛋白质-蛋白质界面的特异性预测和重新设计。