神经网络从深度突变扫描数据中学习蛋白质序列-功能关系。

Neural networks to learn protein sequence-function relationships from deep mutational scanning data.

机构信息

Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706.

Morgridge Institute for Research, Madison, WI 53715.

出版信息

Proc Natl Acad Sci U S A. 2021 Nov 30;118(48). doi: 10.1073/pnas.2104878118.

DOI:10.1073/pnas.2104878118

PMID:34815338

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8640744/

Abstract

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein's behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network's internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks' ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models' ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.

摘要

蛋白质序列到功能的映射非常复杂，因此很难预测序列变化将如何影响蛋白质的行为和特性。我们提出了一种有监督的深度学习框架，从深度突变扫描数据中学习序列-功能映射，并对新的、未表征的序列变体进行预测。我们测试了多种神经网络架构，包括一个结合了蛋白质结构的图卷积网络，以探索网络的内部表示如何影响其学习序列-功能映射的能力。我们的有监督学习方法在性能上优于基于物理和无监督的预测方法。我们发现，能够捕捉非线性相互作用并在序列位置之间共享参数的网络对于学习序列和功能之间的关系非常重要。对训练模型的进一步分析揭示了网络学习有关蛋白质结构和机制的生物学意义信息的能力。最后，我们展示了模型在探索序列空间和设计超出训练集的新蛋白质方面的能力。我们将蛋白质 G B1 结构域（GB1）模型应用于设计一种序列，该序列与免疫球蛋白 G 的结合亲和力比野生型 GB1 高得多。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0d80/8640744/d6cebc83e527/pnas.202104878fig01.jpg

相似文献

Neural networks to learn protein sequence-function relationships from deep mutational scanning data.神经网络从深度突变扫描数据中学习蛋白质序列-功能关系。

Proc Natl Acad Sci U S A. 2021 Nov 30;118(48). doi: 10.1073/pnas.2104878118.

Flattening the curve-How to get better results with small deep-mutational-scanning datasets.拉平曲线——如何从小规模深度突变扫描数据集获得更好的结果。

Proteins. 2024 Jul;92(7):886-902. doi: 10.1002/prot.26686. Epub 2024 Mar 19.

PCP-GC-LM: single-sequence-based protein contact prediction using dual graph convolutional neural network and convolutional neural network.PCP-GC-LM：基于双图卷积神经网络和卷积神经网络的单序列蛋白质接触预测。

BMC Bioinformatics. 2024 Sep 2;25(1):287. doi: 10.1186/s12859-024-05914-3.

Neural network extrapolation to distant regions of the protein fitness landscape.神经网络对蛋白质适应度景观的遥远区域进行外推。

Nat Commun. 2024 Jul 30;15(1):6405. doi: 10.1038/s41467-024-50712-3.

Fast and Flexible Protein Design Using Deep Graph Neural Networks.利用深度图神经网络实现快速灵活的蛋白质设计。

Cell Syst. 2020 Oct 21;11(4):402-411.e4. doi: 10.1016/j.cels.2020.08.016. Epub 2020 Sep 23.

Deep convolutional neural network and IoT technology for healthcare.用于医疗保健的深度卷积神经网络和物联网技术。

Digit Health. 2024 Jan 17;10:20552076231220123. doi: 10.1177/20552076231220123. eCollection 2024 Jan-Dec.

Neural network extrapolation to distant regions of the protein fitness landscape.神经网络对蛋白质适应性景观遥远区域的外推。

bioRxiv. 2023 Nov 9:2023.11.08.566287. doi: 10.1101/2023.11.08.566287.

Computational Protein Design with Deep Learning Neural Networks.深度学习神经网络的计算蛋白质设计。

Sci Rep. 2018 Apr 20;8(1):6349. doi: 10.1038/s41598-018-24760-x.

Using machine learning to predict the effects and consequences of mutations in proteins.利用机器学习预测蛋白质突变的影响和后果。

Curr Opin Struct Biol. 2023 Feb;78:102518. doi: 10.1016/j.sbi.2022.102518. Epub 2023 Jan 3.

Multimodal deep representation learning for protein interaction identification and protein family classification.基于多模态深度表示学习的蛋白质相互作用识别和蛋白质家族分类。

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):531. doi: 10.1186/s12859-019-3084-y.

引用本文的文献

Prediction of enzyme function using an interpretable optimized ensemble learning framework.使用可解释的优化集成学习框架预测酶的功能。

Chem Sci. 2025 Sep 1. doi: 10.1039/d5sc04513d.

Biophysics-based protein language models for protein engineering.用于蛋白质工程的基于生物物理学的蛋白质语言模型。

Nat Methods. 2025 Sep 11. doi: 10.1038/s41592-025-02776-2.

Learning sequence-function relationships with scalable, interpretable Gaussian processes.通过可扩展、可解释的高斯过程学习序列-函数关系。

bioRxiv. 2025 Aug 19:2025.08.15.670613. doi: 10.1101/2025.08.15.670613.

ProDualNet: dual-target protein sequence design method based on protein language model and structure model.ProDualNet：基于蛋白质语言模型和结构模型的双靶点蛋白质序列设计方法。

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf391.

AlphaCD: a machine learning model capable of highly accurate characterization for 21,335 cytidine deaminases.AlphaCD：一种能够对21335种胞嘧啶脱氨酶进行高精度表征的机器学习模型。

Cell Res. 2025 Aug 18. doi: 10.1038/s41422-025-01164-x.

A side-by-side comparison of variant function measurements using deep mutational scanning and base editing.使用深度突变扫描和碱基编辑对变异功能测量进行的并列比较。

Nucleic Acids Res. 2025 Jul 19;53(14). doi: 10.1093/nar/gkaf738.

Investigating the determinants of performance in machine learning for protein fitness prediction.研究蛋白质适应性预测机器学习中性能的决定因素。

Protein Sci. 2025 Aug;34(8):e70235. doi: 10.1002/pro.70235.

GOBeacon: An ensemble model for protein function prediction enhanced by contrastive learning.GOBeacon：一种通过对比学习增强的蛋白质功能预测集成模型。

Protein Sci. 2025 Jul;34(7):e70182. doi: 10.1002/pro.70182.

Multiobjective learning and design of bacteriophage specificity.噬菌体特异性的多目标学习与设计

bioRxiv. 2025 May 19:2025.05.19.654895. doi: 10.1101/2025.05.19.654895.

Designing diverse and high-performance proteins with a large language model in the loop.利用大语言模型循环设计多样化且高性能的蛋白质。

PLoS Comput Biol. 2025 Jun 5;21(6):e1013119. doi: 10.1371/journal.pcbi.1013119. eCollection 2025 Jun.

本文引用的文献

Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2.使用卷积神经网络对生化表型的突变效应进行建模：应用于严重急性呼吸综合征冠状病毒2

iScience. 2022 Jul 15;25(7):104500. doi: 10.1016/j.isci.2022.104500. Epub 2022 Jun 2.

MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect.MAVE-NN：从变异效应的多重分析中学习基因型-表型图谱。

Genome Biol. 2022 Apr 15;23(1):98. doi: 10.1186/s13059-022-02661-7.

Epistatic Net allows the sparse spectral regularization of deep neural networks for inferring fitness functions.上位网络允许对深度神经网络进行稀疏谱正则化，以推断适应度函数。

Nat Commun. 2021 Sep 1;12(1):5225. doi: 10.1038/s41467-021-25371-3.

Structure-based protein function prediction using graph convolutional networks.基于结构的蛋白质功能预测使用图卷积网络。

Nat Commun. 2021 May 26;12(1):3168. doi: 10.1038/s41467-021-23303-9.

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。

Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.

Low-N protein engineering with data-efficient deep learning.低蛋白工程与数据高效深度学习。

Nat Methods. 2021 Apr;18(4):389-396. doi: 10.1038/s41592-021-01100-y. Epub 2021 Apr 7.

Generating functional protein variants with variational autoencoders.利用变分自动编码器生成功能性蛋白质变体。

PLoS Comput Biol. 2021 Feb 26;17(2):e1008736. doi: 10.1371/journal.pcbi.1008736. eCollection 2021 Feb.

Deep diversification of an AAV capsid protein by machine learning.机器学习深度多样化 AAV 衣壳蛋白。

Nat Biotechnol. 2021 Jun;39(6):691-696. doi: 10.1038/s41587-020-00793-4. Epub 2021 Feb 11.

Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.基于大规模正无标签学习推断蛋白质序列-功能关系。

Cell Syst. 2021 Jan 20;12(1):92-101.e8. doi: 10.1016/j.cels.2020.10.007. Epub 2020 Nov 18.

Fast and Flexible Protein Design Using Deep Graph Neural Networks.利用深度图神经网络实现快速灵活的蛋白质设计。

Cell Syst. 2020 Oct 21;11(4):402-411.e4. doi: 10.1016/j.cels.2020.08.016. Epub 2020 Sep 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

神经网络从深度突变扫描数据中学习蛋白质序列-功能关系。

Neural networks to learn protein sequence-function relationships from deep mutational scanning data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献