Yu Xin, Negron Christopher, Huang Lili, Veldman Geertruida
Biotherapeutics Discovery, AbbVie Bioresearch Center, 100 Research Drive, Worcester, MA 01605, USA.
Antib Ther. 2023 May 14;6(2):137-146. doi: 10.1093/abt/tbad011. eCollection 2023 Apr.
The emergence of deep learning models such as AlphaFold2 has revolutionized the structure prediction of proteins. Nevertheless, much remains unexplored, especially on how we utilize structure models to predict biological properties. Herein, we present a method using features extracted from protein language models (PLMs) to predict the major histocompatibility complex class II (MHC-II) binding affinity of peptides. Specifically, we evaluated a novel transfer learning approach where the backbone of our model was interchanged with architectures designed for image classification tasks. Features extracted from several PLMs (ESM1b, ProtXLNet or ProtT5-XL-UniRef) were passed into image models (EfficientNet v2b0, EfficientNet v2m or ViT-16). The optimal pairing of the PLM and image classifier resulted in the final model TransMHCII, outperforming NetMHCIIpan 3.2 and NetMHCIIpan 4.0-BA on the receiver operating characteristic area under the curve, balanced accuracy and Jaccard scores. The architecture innovation may facilitate the development of other deep learning models for biological problems.
诸如AlphaFold2等深度学习模型的出现彻底改变了蛋白质的结构预测。然而,仍有许多未被探索的领域,特别是在我们如何利用结构模型来预测生物学特性方面。在此,我们提出了一种使用从蛋白质语言模型(PLM)中提取的特征来预测肽的主要组织相容性复合体II类(MHC-II)结合亲和力的方法。具体而言,我们评估了一种新颖的迁移学习方法,其中我们模型的主干被与为图像分类任务设计的架构进行了互换。从几个PLM(ESM1b、ProtXLNet或ProtT5-XL-UniRef)中提取的特征被输入到图像模型(EfficientNet v2b0、EfficientNet v2m或ViT-16)中。PLM和图像分类器的最佳配对产生了最终模型TransMHCII,在曲线下的受试者工作特征面积、平衡准确性和杰卡德分数方面优于NetMHCIIpan 3.2和NetMHCIIpan 4.0-BA。这种架构创新可能会促进用于生物学问题的其他深度学习模型的开发。