moPPIt：利用蛋白质语言模型生成基序特异性结合剂。

moPPIt: Generation of Motif-Specific Binders with Protein Language Models.

作者信息

Chen Tong, Zhang Yinuo, Chatterjee Pranam

机构信息

Department of Biomedical Engineering, Duke University.

Department of Biostatistics and Bioinformatics, Duke University.

出版信息

bioRxiv. 2024 Aug 1:2024.07.31.606098. doi: 10.1101/2024.07.31.606098.

DOI:10.1101/2024.07.31.606098

PMID:39131360

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11312608/

Abstract

The ability to precisely target specific motifs on disease-related proteins, whether conserved epitopes on viral proteins, intrinsically disordered regions within transcription factors, or breakpoint junctions in fusion oncoproteins, is essential for modulating their function while minimizing off-target effects. Current methods struggle to achieve this specificity without reliable structural information. In this work, we introduce a tif-specific targeting algorithm, , for generation of motif-specific peptide binders from the target protein sequence alone. At the core of moPPIt is BindEvaluator, a transformer-based model that interpolates protein language model embeddings of two proteins via a series of multi-headed self-attention blocks, with a key focus on local motif features. Trained on over 510,000 annotated PPIs, BindEvaluator accurately predicts target binding sites given protein-protein sequence pairs with a test AUC > 0.94, improving to AUC > 0.96 when fine-tuned on peptide-protein pairs. By combining BindEvaluator with our PepMLM peptide generator and genetic algorithm-based optimization, moPPIt generates peptides that bind specifically to user-defined residues on target proteins. We demonstrate moPPIt's efficacy in computationally designing binders to specific motifs, first on targets with known binding peptides and then extending to structured and disordered targets with no known binders. In total, moPPIt serves as a powerful tool for developing highly specific peptide therapeutics without relying on target structure or structure-dependent latent spaces.

摘要

精确靶向疾病相关蛋白上的特定基序的能力，无论是病毒蛋白上的保守表位、转录因子内的内在无序区域，还是融合癌蛋白中的断点连接，对于调节其功能同时将脱靶效应降至最低至关重要。目前的方法在没有可靠结构信息的情况下难以实现这种特异性。在这项工作中，我们引入了一种名为moPPIt的特定靶向算法，用于仅从靶蛋白序列生成基序特异性肽结合物。moPPIt的核心是BindEvaluator，这是一个基于Transformer的模型，它通过一系列多头自注意力模块对两种蛋白质的蛋白质语言模型嵌入进行插值，重点关注局部基序特征。在超过510,000个注释的蛋白质-蛋白质相互作用（PPI）上进行训练后，BindEvaluator在给定蛋白质-蛋白质序列对时能够准确预测靶结合位点，测试曲线下面积（AUC）>0.94，在肽-蛋白质对上进行微调时AUC提高到>0.96。通过将BindEvaluator与我们的PepMLM肽生成器和基于遗传算法的优化相结合，moPPIt生成能够特异性结合靶蛋白上用户定义残基的肽。我们首先在具有已知结合肽的靶标上，然后扩展到没有已知结合剂的结构化和无序靶标上，证明了moPPIt在计算设计针对特定基序的结合剂方面的有效性。总的来说，moPPIt是一种强大的工具，可用于开发高度特异性的肽疗法，而无需依赖靶标结构或结构依赖的潜在空间。