• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

LM-GVP:一个可扩展的序列和结构信息深度学习框架,用于蛋白质性质预测。

LM-GVP: an extensible sequence and structure informed deep learning framework for protein property prediction.

机构信息

Amazon Machine Learning Solutions Lab, Amazon Web Services, Santa Clara, CA, USA.

Janssen Biotherapeutics, The Janssen Pharmaceutical Companies of Johnson & Johnson, Spring House, PA, USA.

出版信息

Sci Rep. 2022 Apr 27;12(1):6832. doi: 10.1038/s41598-022-10775-y.

DOI:10.1038/s41598-022-10775-y
PMID:35477726
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9046255/
Abstract

Proteins perform many essential functions in biological systems and can be successfully developed as bio-therapeutics. It is invaluable to be able to predict their properties based on a proposed sequence and structure. In this study, we developed a novel generalizable deep learning framework, LM-GVP, composed of a protein Language Model (LM) and Graph Neural Network (GNN) to leverage information from both 1D amino acid sequences and 3D structures of proteins. Our approach outperformed the state-of-the-art protein LMs on a variety of property prediction tasks including fluorescence, protease stability, and protein functions from Gene Ontology (GO). We also illustrated insights into how a GNN prediction head can inform the fine-tuning of protein LMs to better leverage structural information. We envision that our deep learning framework will be generalizable to many protein property prediction problems to greatly accelerate protein engineering and drug development.

摘要

蛋白质在生物系统中执行许多重要功能,并且可以成功地开发为生物治疗药物。能够根据提议的序列和结构来预测它们的性质是非常宝贵的。在这项研究中,我们开发了一种新颖的可推广的深度学习框架 LM-GVP,它由蛋白质语言模型 (LM) 和图神经网络 (GNN) 组成,可利用蛋白质的一维氨基酸序列和三维结构中的信息。我们的方法在各种性质预测任务(包括荧光、蛋白酶稳定性和基因本体论 (GO) 中的蛋白质功能)上均优于最先进的蛋白质 LM。我们还说明了如何通过 GNN 预测头来告知蛋白质 LM 的微调,以更好地利用结构信息。我们设想我们的深度学习框架将可推广到许多蛋白质性质预测问题,从而极大地加速蛋白质工程和药物开发。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4da1/9046255/774dadd24e66/41598_2022_10775_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4da1/9046255/c211b31d30eb/41598_2022_10775_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4da1/9046255/bd807dfff4aa/41598_2022_10775_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4da1/9046255/8877623d1746/41598_2022_10775_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4da1/9046255/774dadd24e66/41598_2022_10775_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4da1/9046255/c211b31d30eb/41598_2022_10775_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4da1/9046255/bd807dfff4aa/41598_2022_10775_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4da1/9046255/8877623d1746/41598_2022_10775_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4da1/9046255/774dadd24e66/41598_2022_10775_Fig4_HTML.jpg

相似文献

1
LM-GVP: an extensible sequence and structure informed deep learning framework for protein property prediction.LM-GVP:一个可扩展的序列和结构信息深度学习框架,用于蛋白质性质预测。
Sci Rep. 2022 Apr 27;12(1):6832. doi: 10.1038/s41598-022-10775-y.
2
An analysis of protein language model embeddings for fold prediction.蛋白质语言模型嵌入物折叠预测分析。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac142.
3
GraphKM: machine and deep learning for K prediction of wildtype and mutant enzymes.GraphKM:用于野生型和突变酶 K 预测的机器学习和深度学习。
BMC Bioinformatics. 2024 Mar 28;25(1):135. doi: 10.1186/s12859-024-05746-1.
4
FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction.FP-GNN:一种用于增强分子性质预测的多功能深度学习架构。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac408.
5
Adaptive Transfer of Graph Neural Networks for Few-Shot Molecular Property Prediction.图神经网络的自适应转移在少样本分子性质预测中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3863-3875. doi: 10.1109/TCBB.2023.3327452. Epub 2023 Dec 25.
6
UniDL4BioPep: a universal deep learning architecture for binary classification in peptide bioactivity.UniDL4BioPep:用于肽生物活性二元分类的通用深度学习架构。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad135.
7
From sequence to function through structure: Deep learning for protein design.从序列到功能再到结构:用于蛋白质设计的深度学习
Comput Struct Biotechnol J. 2022 Nov 19;21:238-250. doi: 10.1016/j.csbj.2022.11.014. eCollection 2023.
8
DeepRank-GNN: a graph neural network framework to learn patterns in protein-protein interfaces.DeepRank-GNN:一种图神经网络框架,用于学习蛋白质-蛋白质界面中的模式。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac759.
9
Contrastive learning of protein representations with graph neural networks for structural and functional annotations.基于图神经网络的蛋白质表示对比学习进行结构和功能注释。
Pac Symp Biocomput. 2023;28:109-120.
10
DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment.DeepECA:一种基于多重序列比对的蛋白质接触预测端到端学习框架。
BMC Bioinformatics. 2020 Jan 9;21(1):10. doi: 10.1186/s12859-019-3190-x.

引用本文的文献

1
Artificial intelligence in orthopedics: fundamentals, current applications, and future perspectives.骨科中的人工智能:基础、当前应用及未来展望。
Mil Med Res. 2025 Aug 4;12(1):42. doi: 10.1186/s40779-025-00633-z.
2
A Survey of Biological Function Prediction Methods with Focus on Natural Language Processing (NLP) and Large Language Models (LLM).以自然语言处理(NLP)和大语言模型(LLM)为重点的生物功能预测方法综述。
Methods Mol Biol. 2025;2941:201-225. doi: 10.1007/978-1-0716-4623-6_13.
3
Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.

本文引用的文献

1
Accurate prediction of protein structures and interactions using a three-track neural network.使用三轨神经网络准确预测蛋白质结构和相互作用。
Science. 2021 Aug 20;373(6557):871-876. doi: 10.1126/science.abj8754. Epub 2021 Jul 15.
2
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
3
Structure-based protein function prediction using graph convolutional networks.基于结构的蛋白质功能预测使用图卷积网络。
蛋白质序列分析全景:任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述
Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.
4
$\mathcal{S}$ able: bridging the gap in protein structure understanding with an empowering and versatile pre-training paradigm.$\mathcal{S}$ able:通过一种强大且通用的预训练范式弥合蛋白质结构理解方面的差距。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf120.
5
SST-ResNet: A Sequence and Structure Information Integration Model for Protein Property Prediction.SST-ResNet:一种用于蛋白质属性预测的序列与结构信息整合模型。
Int J Mol Sci. 2025 Mar 19;26(6):2783. doi: 10.3390/ijms26062783.
6
TopEC: prediction of Enzyme Commission classes by 3D graph neural networks and localized 3D protein descriptor.TopEC:利用三维图神经网络和局部三维蛋白质描述符预测酶委员会类别
Nat Commun. 2025 Mar 20;16(1):2737. doi: 10.1038/s41467-025-57324-5.
7
GENA-LM: a family of open-source foundational DNA language models for long sequences.GENA-LM:用于长序列的开源基础DNA语言模型家族。
Nucleic Acids Res. 2025 Jan 11;53(2). doi: 10.1093/nar/gkae1310.
8
ProteinF3S: boosting enzyme function prediction by fusing protein sequence, structure, and surface.ProteinF3S:通过融合蛋白质序列、结构和表面特征增强酶功能预测
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae695.
9
Rapid prediction of key residues for foldability by machine learning model enables the design of highly functional libraries with hyperstable constrained peptide scaffolds.通过机器学习模型快速预测可折叠性的关键残基,能够设计出具有超稳定受限肽支架的高功能文库。
PLoS Comput Biol. 2024 Nov 18;20(11):e1012609. doi: 10.1371/journal.pcbi.1012609. eCollection 2024 Nov.
10
SSEmb: A joint embedding of protein sequence and structure enables robust variant effect predictions.SSEmb:一种蛋白质序列和结构的联合嵌入方法,可实现稳健的变体效应预测。
Nat Commun. 2024 Nov 7;15(1):9646. doi: 10.1038/s41467-024-53982-z.
Nat Commun. 2021 May 26;12(1):3168. doi: 10.1038/s41467-021-23303-9.
4
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
5
Evaluating Protein Transfer Learning with TAPE.使用TAPE评估蛋白质迁移学习。
Adv Neural Inf Process Syst. 2019 Dec;32:9689-9701.
6
The Gene Ontology resource: enriching a GOld mine.基因本体论资源:丰富一个 GOld 矿。
Nucleic Acids Res. 2021 Jan 8;49(D1):D325-D334. doi: 10.1093/nar/gkaa1113.
7
Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function.无监督蛋白质嵌入在预测分子功能方面优于手工制作的序列和结构特征。
Bioinformatics. 2021 Apr 19;37(2):162-170. doi: 10.1093/bioinformatics/btaa701.
8
Improved protein structure prediction using potentials from deep learning.利用深度学习势进行蛋白质结构预测的改进。
Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan 15.
9
Improved protein structure prediction using predicted interresidue orientations.利用预测的残基间取向改进蛋白质结构预测。
Proc Natl Acad Sci U S A. 2020 Jan 21;117(3):1496-1503. doi: 10.1073/pnas.1914677117. Epub 2020 Jan 2.
10
Modeling aspects of the language of life through transfer-learning protein sequences.通过转移学习蛋白质序列来模拟生命语言的各个方面。
BMC Bioinformatics. 2019 Dec 17;20(1):723. doi: 10.1186/s12859-019-3220-8.