Suppr超能文献

神经网络从深度突变扫描数据中学习蛋白质序列-功能关系。

Neural networks to learn protein sequence-function relationships from deep mutational scanning data.

机构信息

Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706.

Morgridge Institute for Research, Madison, WI 53715.

出版信息

Proc Natl Acad Sci U S A. 2021 Nov 30;118(48). doi: 10.1073/pnas.2104878118.

Abstract

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein's behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network's internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks' ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models' ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.

摘要

蛋白质序列到功能的映射非常复杂,因此很难预测序列变化将如何影响蛋白质的行为和特性。我们提出了一种有监督的深度学习框架,从深度突变扫描数据中学习序列-功能映射,并对新的、未表征的序列变体进行预测。我们测试了多种神经网络架构,包括一个结合了蛋白质结构的图卷积网络,以探索网络的内部表示如何影响其学习序列-功能映射的能力。我们的有监督学习方法在性能上优于基于物理和无监督的预测方法。我们发现,能够捕捉非线性相互作用并在序列位置之间共享参数的网络对于学习序列和功能之间的关系非常重要。对训练模型的进一步分析揭示了网络学习有关蛋白质结构和机制的生物学意义信息的能力。最后,我们展示了模型在探索序列空间和设计超出训练集的新蛋白质方面的能力。我们将蛋白质 G B1 结构域(GB1)模型应用于设计一种序列,该序列与免疫球蛋白 G 的结合亲和力比野生型 GB1 高得多。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d80/8640744/d6cebc83e527/pnas.202104878fig01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验