State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, No.5 Yiheyuan Road, Beijing, 100871, China.
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae135.
Protein design is central to nearly all protein engineering problems, as it can enable the creation of proteins with new biological functions, such as improving the catalytic efficiency of enzymes. One key facet of protein design, fixed-backbone protein sequence design, seeks to design new sequences that will conform to a prescribed protein backbone structure. Nonetheless, existing sequence design methods present limitations, such as low sequence diversity and shortcomings in experimental validation of the designed functional proteins. These inadequacies obstruct the goal of functional protein design. To improve these limitations, we initially developed the Graphormer-based Protein Design (GPD) model. This model utilizes the Transformer on a graph-based representation of three-dimensional protein structures and incorporates Gaussian noise and a sequence random masks to node features, thereby enhancing sequence recovery and diversity. The performance of the GPD model was significantly better than that of the state-of-the-art ProteinMPNN model on multiple independent tests, especially for sequence diversity. We employed GPD to design CalB hydrolase and generated nine artificially designed CalB proteins. The results show a 1.7-fold increase in catalytic activity compared to that of the wild-type CalB and strong substrate selectivity on p-nitrophenyl acetate with different carbon chain lengths (C2-C16). Thus, the GPD method could be used for the de novo design of industrial enzymes and protein drugs. The code was released at https://github.com/decodermu/GPD.
蛋白质设计是几乎所有蛋白质工程问题的核心,因为它可以使具有新生物学功能的蛋白质得以创建,例如提高酶的催化效率。蛋白质设计的一个关键方面是固定骨架蛋白质序列设计,旨在设计出新的序列,使其符合规定的蛋白质骨架结构。然而,现有的序列设计方法存在局限性,例如序列多样性低,以及所设计的功能性蛋白质的实验验证不足。这些不足阻碍了功能性蛋白质设计的目标。为了改进这些局限性,我们最初开发了基于 Graphormer 的蛋白质设计 (GPD) 模型。该模型在基于图形的三维蛋白质结构表示上使用 Transformer,并将高斯噪声和序列随机掩蔽应用于节点特征,从而提高序列恢复和多样性。在多个独立测试中,GPD 模型的性能明显优于最先进的 ProteinMPNN 模型,特别是在序列多样性方面。我们使用 GPD 设计了 CalB 水解酶,并生成了 9 种人工设计的 CalB 蛋白质。结果表明,与野生型 CalB 相比,催化活性提高了 1.7 倍,并且对不同碳链长度(C2-C16)的对硝基苯乙酸酯具有很强的底物选择性。因此,GPD 方法可用于从头设计工业酶和蛋白质药物。该代码已在 https://github.com/decodermu/GPD 上发布。