Department of Computer Science, Brigham Young University, Provo, UT, USA.
Department of Chemical Engineering, Brigham Young University, Provo, UT, USA.
Sci Rep. 2023 Sep 19;13(1):15493. doi: 10.1038/s41598-023-42032-1.
Various approaches have used neural networks as probabilistic models for the design of protein sequences. These "inverse folding" models employ different objective functions, which come with trade-offs that have not been assessed in detail before. This study introduces probabilistic definitions of protein stability and conformational specificity and demonstrates the relationship between these chemical properties and the [Formula: see text] Boltzmann probability objective. This links the Boltzmann probability objective function to experimentally verifiable outcomes. We propose a novel sequence decoding algorithm, referred to as "BayesDesign", that leverages Bayes' Rule to maximize the [Formula: see text] objective instead of the [Formula: see text] objective common in inverse folding models. The efficacy of BayesDesign is evaluated in the context of two protein model systems, the NanoLuc enzyme and the WW structural motif. Both BayesDesign and the baseline ProteinMPNN algorithm increase the thermostability of NanoLuc and increase the conformational specificity of WW. The possible sources of error in the model are analyzed.
各种方法都将神经网络用作蛋白质序列设计的概率模型。这些“反向折叠”模型采用不同的目标函数,这些目标函数存在权衡,之前尚未详细评估。本研究介绍了蛋白质稳定性和构象特异性的概率定义,并证明了这些化学性质与[公式:见正文]玻尔兹曼概率目标之间的关系。这将玻尔兹曼概率目标函数与可通过实验验证的结果联系起来。我们提出了一种新的序列解码算法,称为“BayesDesign”,它利用贝叶斯法则来最大化[公式:见正文]目标,而不是反向折叠模型中常见的[公式:见正文]目标。在两个蛋白质模型系统(NanoLuc 酶和 WW 结构基序)的背景下评估了 BayesDesign 的功效。BayesDesign 和基线 ProteinMPNN 算法都提高了 NanoLuc 的热稳定性,并提高了 WW 的构象特异性。分析了模型中可能存在的误差源。