Siwek Jane C, Omelchenko Alisa A, Chhibbar Prabal, Arshad Sanya, Rosengart AnnaElaine, Nazarali Iliyan, Patel Akash, Nazarali Kiran, Rahimikollu Javad, Tilstra Jeremy S, Shlomchik Mark J, Koes David R, Joglekar Alok V, Das Jishnu
Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.
Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.
Nat Methods. 2025 Jul 28. doi: 10.1038/s41592-025-02723-1.
Protein language models embed protein sequences for different tasks. However, these are suboptimal at learning the language of protein interactions. We developed an interaction language model (iLM), Sliding Window Interaction Grammar (SWING) that leverages differences in amino-acid properties to generate an interaction vocabulary. SWING successfully predicted both class I and class II peptide-major histocompatibility complex interactions. Furthermore, the class I SWING model could uniquely cross-predict class II interactions, a complex prediction task not attempted by existing methods. Using human class I and II data, SWING accurately predicted murine class II peptide-major histocompatibility interactions involving risk alleles in systemic lupus erythematosus and type 1 diabetes. SWING accurately predicted how variants can disrupt specific protein-protein interactions, based on sequence information alone. SWING outperformed passive uses of protein language model embeddings, demonstrating the value of the unique iLM architecture. Overall, SWING is a generalizable zero-shot iLM that learns the language of protein-protein interactions.
蛋白质语言模型为不同任务嵌入蛋白质序列。然而,这些模型在学习蛋白质相互作用的语言方面并不理想。我们开发了一种相互作用语言模型(iLM),即滑动窗口相互作用语法(SWING),它利用氨基酸特性的差异来生成相互作用词汇表。SWING成功预测了I类和II类肽-主要组织相容性复合体的相互作用。此外,I类SWING模型可以独特地交叉预测II类相互作用,这是现有方法未尝试过的复杂预测任务。利用人类I类和II类数据,SWING准确预测了涉及系统性红斑狼疮和1型糖尿病风险等位基因的小鼠II类肽-主要组织相容性相互作用。SWING仅基于序列信息就能准确预测变体如何破坏特定的蛋白质-蛋白质相互作用。SWING优于蛋白质语言模型嵌入的被动使用,证明了独特的iLM架构的价值。总体而言,SWING是一种可推广的零样本iLM,它学习蛋白质-蛋白质相互作用的语言。