Suppr超能文献

神经网络对蛋白质适应度景观的遥远区域进行外推。

Neural network extrapolation to distant regions of the protein fitness landscape.

机构信息

Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA.

Department of Chemical & Biological Engineering, University of Wisconsin-Madison, Madison, WI, USA.

出版信息

Nat Commun. 2024 Jul 30;15(1):6405. doi: 10.1038/s41467-024-50712-3.

Abstract

Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitness peaks. In this work, we evaluate neural networks' capacity to extrapolate beyond their training data. We perform model-guided design using a panel of neural network architectures trained on protein G (GB1)-Immunoglobulin G (IgG) binding data and experimentally test thousands of GB1 designs to systematically evaluate the models' extrapolation. We find each model architecture infers markedly different landscapes from the same data, which give rise to unique design preferences. We find simpler models excel in local extrapolation to design high fitness proteins, while more sophisticated convolutional models can venture deep into sequence space to design proteins that fold but are no longer functional. We also find that implementing a simple ensemble of convolutional neural networks enables robust design of high-performing variants in the local landscape. Our findings highlight how each architecture's inductive biases prime them to learn different aspects of the protein fitness landscape and how a simple ensembling approach makes protein engineering more robust.

摘要

机器学习(ML)通过构建潜在序列-功能景观的模型来加速新生物分子的发现,从而改变了蛋白质工程。ML 指导的蛋白质设计需要基于局部序列-功能信息进行训练的模型,以准确预测遥远的适应度峰。在这项工作中,我们评估了神经网络超越其训练数据进行外推的能力。我们使用一组基于蛋白质 G(GB1)-免疫球蛋白 G(IgG)结合数据训练的神经网络架构进行模型指导设计,并通过实验测试了数千个 GB1 设计,以系统地评估模型的外推能力。我们发现每个模型架构从相同的数据推断出明显不同的景观,从而产生独特的设计偏好。我们发现,更简单的模型在局部外推方面表现出色,可以设计出高适应度的蛋白质,而更复杂的卷积模型则可以深入序列空间设计出折叠但不再具有功能的蛋白质。我们还发现,实现简单的卷积神经网络集成可以使在局部景观中设计高性能变体更加稳健。我们的研究结果突出了每个架构的归纳偏差如何促使它们学习蛋白质适应度景观的不同方面,以及简单的集成方法如何使蛋白质工程更加稳健。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99a4/11289474/7447e31b8332/41467_2024_50712_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验