Department of Biochemistry, University of Washington, Seattle, WA 98195.
Institute for Protein Design, University of Washington, Seattle, WA 98195.
Proc Natl Acad Sci U S A. 2023 Feb 28;120(9):e2216697120. doi: 10.1073/pnas.2216697120. Epub 2023 Feb 21.
Peptide-binding proteins play key roles in biology, and predicting their binding specificity is a long-standing challenge. While considerable protein structural information is available, the most successful current methods use sequence information alone, in part because it has been a challenge to model the subtle structural changes accompanying sequence substitutions. Protein structure prediction networks such as AlphaFold model sequence-structure relationships very accurately, and we reasoned that if it were possible to specifically train such networks on binding data, more generalizable models could be created. We show that placing a classifier on top of the AlphaFold network and fine-tuning the combined network parameters for both classification and structure prediction accuracy leads to a model with strong generalizable performance on a wide range of Class I and Class II peptide-MHC interactions that approaches the overall performance of the state-of-the-art NetMHCpan sequence-based method. The peptide-MHC optimized model shows excellent performance in distinguishing binding and non-binding peptides to SH3 and PDZ domains. This ability to generalize well beyond the training set far exceeds that of sequence-only models and should be particularly powerful for systems where less experimental data are available.
肽结合蛋白在生物学中发挥着关键作用,预测它们的结合特异性是一个长期存在的挑战。虽然已经有相当多的蛋白质结构信息,但目前最成功的方法仅使用序列信息,部分原因是很难对伴随序列取代的细微结构变化进行建模。像 AlphaFold 这样的蛋白质结构预测网络非常准确地模拟了序列-结构关系,我们推断,如果有可能专门针对结合数据对这些网络进行训练,那么可以创建更具通用性的模型。我们表明,在 AlphaFold 网络之上放置一个分类器,并针对分类和结构预测准确性微调组合网络参数,可导致在广泛的 I 类和 II 类肽-MHC 相互作用上具有强大的可推广性能的模型,其整体性能接近最先进的基于 NetMHCpan 序列方法的性能。经过肽-MHC 优化的模型在区分 SH3 和 PDZ 结构域的结合肽和非结合肽方面表现出优异的性能。这种能够很好地推广到训练集之外的能力远远超过了仅基于序列的模型,对于那些实验数据较少的系统尤其具有强大的作用。