Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, China.
Shanghai National Center for Applied Mathematics (SJTU Center), Shanghai 200240, China.
J Chem Inf Model. 2024 May 13;64(9):3650-3661. doi: 10.1021/acs.jcim.4c00036. Epub 2024 Apr 17.
Protein engineering faces challenges in finding optimal mutants from a massive pool of candidate mutants. In this study, we introduce a deep-learning-based data-efficient fitness prediction tool to steer protein engineering. Our methodology establishes a lightweight graph neural network scheme for protein structures, which efficiently analyzes the microenvironment of amino acids in wild-type proteins and reconstructs the distribution of the amino acid sequences that are more likely to pass natural selection. This distribution serves as a general guidance for scoring proteins toward arbitrary properties on any order of mutations. Our proposed solution undergoes extensive wet-lab experimental validation spanning diverse physicochemical properties of various proteins, including fluorescence intensity, antigen-antibody affinity, thermostability, and DNA cleavage activity. More than 40% of ProtLGN-designed single-site mutants outperform their wild-type counterparts across all studied proteins and targeted properties. More importantly, our model can bypass the negative epistatic effect to combine single mutation sites and form deep mutants with up to seven mutation sites in a single round, whose physicochemical properties are significantly improved. This observation provides compelling evidence of the structure-based model's potential to guide deep mutations in protein engineering. Overall, our approach emerges as a versatile tool for protein engineering, benefiting both the computational and bioengineering communities.
蛋白质工程在从大量候选突变体中寻找最佳突变体方面面临挑战。在这项研究中,我们引入了一种基于深度学习的数据高效适应性预测工具来指导蛋白质工程。我们的方法建立了一个轻量级的图神经网络方案,用于蛋白质结构,该方案可以有效地分析野生型蛋白质中氨基酸的微环境,并重建更有可能通过自然选择的氨基酸序列分布。该分布为在任意突变数量级上对蛋白质进行任意属性评分提供了一般指导。我们提出的解决方案经过广泛的湿实验室实验验证,涵盖了各种蛋白质的多种物理化学性质,包括荧光强度、抗原-抗体亲和力、热稳定性和 DNA 切割活性。在所有研究的蛋白质和目标特性中,超过 40%的 ProtLGN 设计的单点突变体优于其野生型对应物。更重要的是,我们的模型可以绕过负上位效应,在单个回合中组合单个突变位点,并形成多达七个突变位点的深度突变体,其物理化学性质得到显著改善。这一观察结果为基于结构的模型在蛋白质工程中指导深度突变提供了有力的证据。总的来说,我们的方法是一种多功能的蛋白质工程工具,对计算和生物工程社区都有好处。