Institute of Biochemistry, Graz University of Technology, Graz, Austria.
BioTechMed, Graz, Austria.
Protein Sci. 2024 Dec;33(12):e5221. doi: 10.1002/pro.5221.
Protein structure prediction and (re)design have gone through a revolution in the last 3 years. The tremendous progress in these fields has been almost exclusively driven by readily available machine learning algorithms applied to protein folding and sequence design problems. Despite these advancements, predicting site-specific mutational effects on protein stability and function remains an unsolved problem. This is a persistent challenge, mainly because the free energy of large systems is very difficult to compute with absolute accuracy and subtle changes to protein structures are hard to capture with computational models. Here, we describe the implementation and use of ESM-Scan, which uses the ESM zero-shot predictor to scan entire protein sequences for preferential amino acid changes, thus enabling in silico deep mutational scanning experiments. We benchmark ESM-Scan on its predictive capabilities for stability and functionality of sequence changes using three publicly available datasets and proceed by experimentally testing the tool's performance on a challenging test case of a blue-light-activated diguanylate cyclase from Methylotenera species (MsLadC), where it accurately predicted the importance of a highly conserved residue in a region involved in allosteric product inhibition. Our experimental results show that the ESM-zero shot model is capable of inferring the effects of a set of amino acid substitutions in their correlation between predicted fitness and experimental results. ESM-Scan is publicly available at https://huggingface.co/spaces/thaidaev/zsp.
在过去的 3 年中,蛋白质结构预测和(重新)设计经历了一场革命。这些领域的巨大进展几乎完全是由可用于蛋白质折叠和序列设计问题的现成机器学习算法推动的。尽管取得了这些进展,但预测特定位置的突变对蛋白质稳定性和功能的影响仍然是一个未解决的问题。这是一个持续存在的挑战,主要是因为大系统的自由能很难用绝对精度来计算,而且计算模型很难捕捉到蛋白质结构的细微变化。在这里,我们描述了 ESM-Scan 的实现和使用,它使用 ESM 零样本预测器来扫描整个蛋白质序列,以寻找优先的氨基酸变化,从而能够在计算机上进行深度突变扫描实验。我们使用三个公开可用的数据集来评估 ESM-Scan 在预测序列变化的稳定性和功能方面的能力,并通过实验测试该工具在一个具有挑战性的蓝光照亮的二鸟苷酸环化酶测试案例(MsLadC)中的性能,该案例准确地预测了一个高度保守残基在变构产物抑制相关区域的重要性。我们的实验结果表明,ESM 零样本模型能够推断出一组氨基酸取代在其与预测适应性和实验结果之间的相关性中的作用。ESM-Scan 可在 https://huggingface.co/spaces/thaidaev/zsp 上公开获得。