Suppr超能文献

来自等变图变换器的上下文蛋白质和抗体编码。

Contextual protein and antibody encodings from equivariant graph transformers.

作者信息

Mahajan Sai Pooja, Ruffolo Jeffrey A, Gray Jeffrey J

机构信息

Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States.

Program in Molecular Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, United States.

出版信息

bioRxiv. 2023 Jul 29:2023.07.15.549154. doi: 10.1101/2023.07.15.549154.

Abstract

The optimal residue identity at each position in a protein is determined by its structural, evolutionary, and functional context. We seek to learn the representation space of the optimal amino-acid residue in different structural contexts in proteins. Inspired by masked language modeling (MLM), our training aims to transduce learning of amino-acid labels from non-masked residues to masked residues in their structural environments and from general (e.g., a residue in a protein) to specific contexts (e.g., a residue at the interface of a protein or antibody complex). Our results on native sequence recovery and forward folding with AlphaFold2 suggest that the amino acid label for a protein residue may be determined from its structural context alone (i.e., without knowledge of the sequence labels of surrounding residues). We further find that the sequence space sampled from our masked models recapitulate the evolutionary sequence neighborhood of the wildtype sequence. Remarkably, the sequences conditioned on highly plastic structures recapitulate the conformational flexibility encoded in the structures. Furthermore, maximum-likelihood interfaces designed with masked models recapitulate wildtype binding energies for a wide range of protein interfaces and binding strengths. We also propose and compare fine-tuning strategies to train models for designing CDR loops of antibodies in the structural context of the antibody-antigen interface by leveraging structural databases for proteins, antibodies (synthetic and experimental) and protein-protein complexes. We show that pretraining on more general contexts improves native sequence recovery for antibody CDR loops, especially for the hypervariable CDR H3, while fine-tuning helps to preserve patterns observed in special contexts.

摘要

蛋白质中每个位置的最佳残基一致性由其结构、进化和功能背景决定。我们试图了解蛋白质在不同结构背景下最佳氨基酸残基的表示空间。受掩码语言建模(MLM)的启发,我们的训练旨在将氨基酸标签的学习从未掩码的残基转换到其结构环境中的掩码残基,并从一般(例如,蛋白质中的一个残基)转换到特定背景(例如,蛋白质或抗体复合物界面处的一个残基)。我们在使用AlphaFold2进行天然序列恢复和正向折叠方面的结果表明,蛋白质残基的氨基酸标签可能仅由其结构背景决定(即,无需了解周围残基的序列标签)。我们进一步发现,从我们的掩码模型中采样的序列空间概括了野生型序列的进化序列邻域。值得注意的是,以高度可塑性结构为条件的序列概括了结构中编码的构象灵活性。此外,用掩码模型设计的最大似然界面概括了广泛蛋白质界面和结合强度的野生型结合能。我们还提出并比较了微调策略,以通过利用蛋白质、抗体(合成和实验)以及蛋白质 - 蛋白质复合物的结构数据库,在抗体 - 抗原界面的结构背景下训练用于设计抗体互补决定区(CDR)环的模型。我们表明,在更一般的背景上进行预训练可提高抗体CDR环的天然序列恢复,特别是对于高变的CDR H3,而微调有助于保留在特殊背景中观察到的模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba0a/10395198/504c908ded96/nihpp-2023.07.15.549154v2-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验