Wang Eric
Institute for Medical Engineering and Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.
Bioinform Adv. 2023 Jan 2;3(1):vbac103. doi: 10.1093/bioadv/vbac103. eCollection 2023.
The ability to predict antibody-antigen binding is essential for computational models of antibody affinity maturation and protein design. While most models aim to predict binding for arbitrary antigens and antibodies, the global impact of SARS-CoV-2 on public health and the availability of associated data suggest that a SARS-CoV-2-specific model would be highly beneficial. In this work, we present a neural network model, trained on ∼315 000 datapoints from deep mutational scanning experiments, that predicts escape fractions of SARS-CoV-2 RBDs binding to arbitrary antibodies. The antibody embeddings within the model constitute an effective sequence space, which correlates with the Hamming distance, suggesting that these embeddings may be useful for downstream tasks such as binding prediction. Indeed, the model achieves Spearman correlation coefficients of 0.46 and 0.52 on two held-out test sets. By comparison, correlation coefficients calculated using existing structure and sequence-based models do not exceed 0.28. The correlation coefficient against dissociation constants of antibodies binding to SARS-CoV-2 RBD variants is 0.46. Additionally, the residue-level escapes are highest in the antibody epitope, correlating well with experimentally measured escapes. We further study the effect of antibody chain use, embedding dimension size and feed-forward and convolutional architectures on the model results. Lastly, we find that the inference time of our model is significantly faster than previous models, suggesting that it could be a useful tool for the accurate and rapid prediction of antibodies binding to SARS-CoV-2 RBDs.
The model and associated code are available for download at https://github.com/ericzwang/RBD_AB.
Supplementary data are available at online.
预测抗体 - 抗原结合的能力对于抗体亲和力成熟和蛋白质设计的计算模型至关重要。虽然大多数模型旨在预测任意抗原和抗体的结合,但严重急性呼吸综合征冠状病毒2(SARS-CoV-2)对公共卫生的全球影响以及相关数据的可用性表明,特定于SARS-CoV-2的模型将非常有益。在这项工作中,我们提出了一种神经网络模型,该模型基于深度突变扫描实验的约315000个数据点进行训练,可预测SARS-CoV-2受体结合域(RBD)与任意抗体结合的逃逸分数。模型中的抗体嵌入构成了一个有效的序列空间,该空间与汉明距离相关,这表明这些嵌入对于诸如结合预测等下游任务可能有用。实际上,该模型在两个保留测试集上的斯皮尔曼相关系数分别为0.46和0.52。相比之下,使用现有基于结构和序列的模型计算的相关系数不超过0.28。与抗体结合SARS-CoV-2 RBD变体的解离常数的相关系数为0.46。此外,抗体表位中的残基水平逃逸最高,与实验测量的逃逸情况相关性良好。我们进一步研究了抗体链使用、嵌入维度大小以及前馈和卷积架构对模型结果的影响。最后,我们发现我们模型的推理时间明显快于以前的模型,这表明它可能是准确快速预测与SARS-CoV-2 RBD结合的抗体的有用工具。
该模型及相关代码可在https://github.com/ericzwang/RBD_AB上下载。
补充数据可在网上获取。