利用机器学习预测蛋白质突变的影响和后果。

Using machine learning to predict the effects and consequences of mutations in proteins.

机构信息

Department of Chemistry, The University of Texas at Austin, 105 E 24TH St., Austin, 78712, Texas, USA; Department of Molecular Biosciences, The University of Texas at Austin, 100 East 24th St., Stop A5000, Austin, 78712, Texas, USA. Electronic address: https://twitter.com/aiproteins.

Department of Integrative Biology, The University of Texas at Austin, 2415 Speedway, Stop C0930, Austin, 78712, Texas, USA.

出版信息

Curr Opin Struct Biol. 2023 Feb;78:102518. doi: 10.1016/j.sbi.2022.102518. Epub 2023 Jan 3.

DOI:10.1016/j.sbi.2022.102518

PMID:36603229

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9908841/

Abstract

Machine and deep learning approaches can leverage the increasingly available massive datasets of protein sequences, structures, and mutational effects to predict variants with improved fitness. Many different approaches are being developed, but systematic benchmarking studies indicate that even though the specifics of the machine learning algorithms matter, the more important constraint comes from the data availability and quality utilized during training. In cases where little experimental data are available, unsupervised and self-supervised pre-training with generic protein datasets can still perform well after subsequent refinement via hybrid or transfer learning approaches. Overall, recent progress in this field has been staggering, and machine learning approaches will likely play a major role in future breakthroughs in protein biochemistry and engineering.

摘要

机器学习和深度学习方法可以利用日益丰富的蛋白质序列、结构和突变效应的海量数据集来预测具有更高适应性的变体。目前正在开发许多不同的方法，但系统的基准测试研究表明，尽管机器学习算法的具体细节很重要，但更重要的限制因素来自于训练过程中使用的数据的可用性和质量。在实验数据很少的情况下，使用通用蛋白质数据集进行无监督和自监督预训练，仍然可以在后续通过混合或转移学习方法进行细化后取得良好的效果。总的来说，该领域的最新进展令人瞩目，机器学习方法很可能在未来的蛋白质生物化学和工程学的突破中发挥重要作用。

相似文献

Using machine learning to predict the effects and consequences of mutations in proteins.

Curr Opin Struct Biol. 2023 Feb;78:102518. doi: 10.1016/j.sbi.2022.102518. Epub 2023 Jan 3.

Flattening the curve-How to get better results with small deep-mutational-scanning datasets.

Proteins. 2024 Jul;92(7):886-902. doi: 10.1002/prot.26686. Epub 2024 Mar 19.

Neural networks to learn protein sequence-function relationships from deep mutational scanning data.

Proc Natl Acad Sci U S A. 2021 Nov 30;118(48). doi: 10.1073/pnas.2104878118.

Machine Learning Methods for Small Data Challenges in Molecular Science.

Chem Rev. 2023 Jul 12;123(13):8736-8780. doi: 10.1021/acs.chemrev.3c00189. Epub 2023 Jun 29.

Learning the shape of protein microenvironments with a holographic convolutional neural network.

Proc Natl Acad Sci U S A. 2024 Feb 6;121(6):e2300838121. doi: 10.1073/pnas.2300838121. Epub 2024 Feb 1.

Transfer learning to leverage larger datasets for improved prediction of protein stability changes.

Proc Natl Acad Sci U S A. 2024 Feb 6;121(6):e2314853121. doi: 10.1073/pnas.2314853121. Epub 2024 Jan 29.

3-D Deconvolutional Networks for the Unsupervised Representation Learning of Human Motions.

IEEE Trans Cybern. 2022 Jan;52(1):398-410. doi: 10.1109/TCYB.2020.2973300. Epub 2022 Jan 11.

Machine learning random forest for predicting oncosomatic variant NGS analysis.

Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.

Deep Dive into Machine Learning Models for Protein Engineering.

J Chem Inf Model. 2020 Jun 22;60(6):2773-2790. doi: 10.1021/acs.jcim.0c00073. Epub 2020 May 5.

ELASPIC2 (EL2): Combining Contextualized Language Models and Graph Neural Networks to Predict Effects of Mutations.

J Mol Biol. 2021 May 28;433(11):166810. doi: 10.1016/j.jmb.2021.166810. Epub 2021 Jan 13.

引用本文的文献

Language Modelling Techniques for Analysing the Impact of Human Genetic Variation.

Bioinform Biol Insights. 2025 Sep 2;19:11779322251358314. doi: 10.1177/11779322251358314. eCollection 2025.

Medium-sized protein language models perform well at transfer learning on realistic datasets.

Sci Rep. 2025 Jul 1;15(1):21400. doi: 10.1038/s41598-025-05674-x.

Physical principles underpinning molecular-level protein evolution.

Eur Biophys J. 2025 Jun 26. doi: 10.1007/s00249-025-01776-6.

A systematic evaluation of the language-of-viral-escape model using multiple machine learning frameworks.

J R Soc Interface. 2025 Apr;22(225):20240598. doi: 10.1098/rsif.2024.0598. Epub 2025 Apr 30.

Decoding the effects of mutation on protein interactions using machine learning.

Biophys Rev (Melville). 2025 Feb 21;6(1):011307. doi: 10.1063/5.0249920. eCollection 2025 Mar.

Evolutionary rewiring of the dynamic network underpinning allosteric epistasis in NS1 of the influenza A virus.

Proc Natl Acad Sci U S A. 2025 Feb 25;122(8):e2410813122. doi: 10.1073/pnas.2410813122. Epub 2025 Feb 20.

Self-supervised machine learning methods for protein design improve sampling but not the identification of high-fitness variants.

Sci Adv. 2025 Feb 14;11(7):eadr7338. doi: 10.1126/sciadv.adr7338. Epub 2025 Feb 12.

Deep Learning Approaches for the Prediction of Protein Functional Sites.

Molecules. 2025 Jan 7;30(2):214. doi: 10.3390/molecules30020214.

Comprehensive in silico analysis of single nucleotide polymorphism and molecular dynamics simulation of human GATA6 protein in ventricular septal defect.

Narra J. 2024 Dec;4(3):e1344. doi: 10.52225/narra.v4i3.1344. Epub 2024 Dec 10.

Scaling down for efficiency: Medium-sized protein language models perform well at transfer learning on realistic datasets.

bioRxiv. 2025 Jan 28:2024.11.22.624936. doi: 10.1101/2024.11.22.624936.

本文引用的文献

ProS-GNN: Predicting effects of mutations on protein stability using graph neural networks.

Comput Biol Chem. 2023 Dec;107:107952. doi: 10.1016/j.compbiolchem.2023.107952. Epub 2023 Aug 26.

Scalable hybrid deep neural networks/polarizable potentials biomolecular simulations including long-range effects.

Chem Sci. 2023 Apr 4;14(20):5438-5452. doi: 10.1039/d2sc04815a. eCollection 2023 May 24.

Robust deep learning-based protein sequence design using ProteinMPNN.

Science. 2022 Oct 7;378(6615):49-56. doi: 10.1126/science.add2187. Epub 2022 Sep 15.

Interpreting protein variant effects with computational predictors and deep mutational scanning.

Dis Model Mech. 2022 Jun 1;15(6). doi: 10.1242/dmm.049510. Epub 2022 Jun 23.

Machine learning-aided engineering of hydrolases for PET depolymerization.

Nature. 2022 Apr;604(7907):662-667. doi: 10.1038/s41586-022-04599-z. Epub 2022 Apr 27.

Linking protein structural and functional change to mutation using amino acid networks.

PLoS One. 2022 Jan 21;17(1):e0261829. doi: 10.1371/journal.pone.0261829. eCollection 2022.

Learning protein fitness models from evolutionary and assay-labeled data.

Nat Biotechnol. 2022 Jul;40(7):1114-1122. doi: 10.1038/s41587-021-01146-5. Epub 2022 Jan 17.

Predicting and interpreting large-scale mutagenesis data using analyses of protein stability and conservation.

Cell Rep. 2022 Jan 11;38(2):110207. doi: 10.1016/j.celrep.2021.110207.

ProteinBERT: a universal deep-learning model of protein sequence and function.

Bioinformatics. 2022 Apr 12;38(8):2102-2110. doi: 10.1093/bioinformatics/btac020.

Embeddings from protein language models predict conservation and variant effects.

Hum Genet. 2022 Oct;141(10):1629-1647. doi: 10.1007/s00439-021-02411-y. Epub 2021 Dec 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用机器学习预测蛋白质突变的影响和后果。

Using machine learning to predict the effects and consequences of mutations in proteins.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献