Fan Lingxi, Wang Hui, Gao Han, Ding Yekun, Zhao Jintong, Luo Huiying, Tu Tao, Wu Ningfeng, Yao Bin, Guan Feifei, Tian Jian, Huang Huoqing
National Key Laboratory of Agricultural Microbiology, Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
State Key Laboratory of Animal Nutrition and Feeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
iScience. 2025 Aug 5;28(9):113273. doi: 10.1016/j.isci.2025.113273. eCollection 2025 Sep 19.
Generative models have transformed protein design by enabling the generation of extensive datasets. However, accurate identification of biologically active sequences with specific functions within such data remains a significant challenge. In this study, we present a novel pipeline that integrates models for sequence generation, ranking, and selection to engineer proteins with enhanced properties. Our Omni-Directional Multipoint Mutagenesis (ODM) generation model was developed by refining a pre-trained protein BERT model to produce 100,000 mutant proteins. To evaluate the effects of mutations on protein activity, we utilized the lowest probability prediction across all masked positions as an indicator to rank the mutant sequences. Furthermore, we developed thermostability models to identify protease mutants with improved thermostability and utilized biological indicators to enhance lysozyme activity by introducing additional basic residues. Through two iterative design cycles, we observed that 62.5% of protease mutants exhibited enhanced thermostability, while 50% of lysozyme mutants displayed increased bacteriolytic activity.
生成模型通过生成大量数据集改变了蛋白质设计。然而,在这些数据中准确识别具有特定功能的生物活性序列仍然是一项重大挑战。在本研究中,我们提出了一种新颖的流程,该流程整合了用于序列生成、排序和选择的模型,以设计具有增强特性的蛋白质。我们的全向多点诱变(ODM)生成模型是通过优化预训练的蛋白质BERT模型开发的,以产生100,000个突变蛋白质。为了评估突变对蛋白质活性的影响,我们将所有掩码位置的最低概率预测用作对突变序列进行排序的指标。此外,我们开发了热稳定性模型来识别具有改善热稳定性的蛋白酶突变体,并利用生物学指标通过引入额外的碱性残基来增强溶菌酶活性。通过两个迭代设计周期,我们观察到62.5%的蛋白酶突变体表现出增强的热稳定性,而50%的溶菌酶突变体表现出增加的溶菌活性。