利用更大的数据集进行迁移学习，以提高蛋白质稳定性变化预测的准确性。

Transfer learning to leverage larger datasets for improved prediction of protein stability changes.

机构信息

Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, NC 27599.

Division of Chemical Biology and Medicinal Chemistry, University of North Carolina Eshelman School of Pharmacy, Chapel Hill, NC 27599.

出版信息

Proc Natl Acad Sci U S A. 2024 Feb 6;121(6):e2314853121. doi: 10.1073/pnas.2314853121. Epub 2024 Jan 29.

DOI:10.1073/pnas.2314853121

PMID:38285937

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10861915/

Abstract

Amino acid mutations that lower a protein's thermodynamic stability are implicated in numerous diseases, and engineered proteins with enhanced stability can be important in research and medicine. Computational methods for predicting how mutations perturb protein stability are, therefore, of great interest. Despite recent advancements in protein design using deep learning, in silico prediction of stability changes has remained challenging, in part due to a lack of large, high-quality training datasets for model development. Here, we describe ThermoMPNN, a deep neural network trained to predict stability changes for protein point mutations given an initial structure. In doing so, we demonstrate the utility of a recently released megascale stability dataset for training a robust stability model. We also employ transfer learning to leverage a second, larger dataset by using learned features extracted from ProteinMPNN, a deep neural network trained to predict a protein's amino acid sequence given its three-dimensional structure. We show that our method achieves state-of-the-art performance on established benchmark datasets using a lightweight model architecture that allows for rapid, scalable predictions. Finally, we make ThermoMPNN readily available as a tool for stability prediction and design.

摘要

降低蛋白质热力学稳定性的氨基酸突变与许多疾病有关，而稳定性增强的工程蛋白在研究和医学中可能很重要。因此，预测突变如何干扰蛋白质稳定性的计算方法非常重要。尽管最近在使用深度学习进行蛋白质设计方面取得了进展，但由于缺乏用于模型开发的大型、高质量训练数据集，因此稳定性变化的计算预测仍然具有挑战性。在这里，我们描述了 ThermoMPNN，这是一种深度神经网络，用于根据初始结构预测蛋白质点突变的稳定性变化。为此，我们展示了最近发布的大规模稳定性数据集在训练稳健稳定性模型方面的实用性。我们还通过使用从 ProteinMPNN 中提取的学习特征来利用第二个更大的数据集来进行转移学习，ProteinMPNN 是一种深度神经网络，用于根据其三维结构预测蛋白质的氨基酸序列。我们表明，我们的方法在使用轻量级模型架构的既定基准数据集上实现了最先进的性能，该架构允许快速、可扩展的预测。最后，我们使 ThermoMPNN 作为稳定性预测和设计的工具易于使用。