Suppr超能文献

基于序列的深度学习表示的统一理性蛋白质工程。

Unified rational protein engineering with sequence-based deep representation learning.

机构信息

Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.

MIT Media Laboratory, Cambridge, MA, USA.

出版信息

Nat Methods. 2019 Dec;16(12):1315-1322. doi: 10.1038/s41592-019-0598-1. Epub 2019 Oct 21.

Abstract

Rational protein engineering requires a holistic understanding of protein function. Here, we apply deep learning to unlabeled amino-acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily and biophysically grounded. We show that the simplest models built on top of this unified representation (UniRep) are broadly applicable and generalize to unseen regions of sequence space. Our data-driven approach predicts the stability of natural and de novo designed proteins, and the quantitative function of molecularly diverse mutants, competitively with the state-of-the-art methods. UniRep further enables two orders of magnitude efficiency improvement in a protein engineering task. UniRep is a versatile summary of fundamental protein features that can be applied across protein engineering informatics.

摘要

理性蛋白质工程需要对蛋白质功能有一个整体的理解。在这里,我们将深度学习应用于未标记的氨基酸序列,将蛋白质的基本特征提炼成一种统计表示,这种表示在语义上是丰富的,在结构上、进化上和生物物理上是有根据的。我们表明,建立在这个统一表示基础上的最简单的模型(UniRep)具有广泛的适用性,并能推广到序列空间中未见的区域。我们的数据驱动方法可以与最先进的方法竞争,预测天然和从头设计的蛋白质的稳定性,以及分子多样性突变体的定量功能。UniRep 进一步使蛋白质工程任务的效率提高了两个数量级。UniRep 是一种基本蛋白质特征的多功能总结,可以应用于蛋白质工程信息学的各个方面。

相似文献

1
Unified rational protein engineering with sequence-based deep representation learning.
Nat Methods. 2019 Dec;16(12):1315-1322. doi: 10.1038/s41592-019-0598-1. Epub 2019 Oct 21.
2
ECNet is an evolutionary context-integrated deep learning framework for protein engineering.
Nat Commun. 2021 Sep 30;12(1):5743. doi: 10.1038/s41467-021-25976-8.
3
Zero-shot prediction of mutation effects with multimodal deep representation learning guides protein engineering.
Cell Res. 2024 Sep;34(9):630-647. doi: 10.1038/s41422-024-00989-2. Epub 2024 Jul 5.
4
Low-N protein engineering with data-efficient deep learning.
Nat Methods. 2021 Apr;18(4):389-396. doi: 10.1038/s41592-021-01100-y. Epub 2021 Apr 7.
5
A Gradient of Sitewise Diversity Promotes Evolutionary Fitness for Binder Discovery in a Three-Helix Bundle Protein Scaffold.
Biochemistry. 2017 Mar 21;56(11):1656-1671. doi: 10.1021/acs.biochem.6b01142. Epub 2017 Mar 9.
6
Protein sequence design with a learned potential.
Nat Commun. 2022 Feb 8;13(1):746. doi: 10.1038/s41467-022-28313-9.
7
Anticancer peptides prediction with deep representation learning features.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab008.
8
Context-aware geometric deep learning for protein sequence design.
Nat Commun. 2024 Jul 25;15(1):6273. doi: 10.1038/s41467-024-50571-y.
9
Protein Engineering with Lightweight Graph Denoising Neural Networks.
J Chem Inf Model. 2024 May 13;64(9):3650-3661. doi: 10.1021/acs.jcim.4c00036. Epub 2024 Apr 17.
10
prPred-DRLF: Plant R protein predictor using deep representation learning features.
Proteomics. 2022 Jan;22(1-2):e2100161. doi: 10.1002/pmic.202100161. Epub 2021 Oct 14.

引用本文的文献

1
Protein Language Model Identifies Disordered, Conserved Motifs Implicated in Phase Separation.
bioRxiv. 2025 Jul 23:2024.12.12.628175. doi: 10.1101/2024.12.12.628175.
2
Safe model based optimization balancing exploration and reliability for protein sequence design.
Sci Rep. 2025 Jul 29;15(1):27568. doi: 10.1038/s41598-025-12568-5.
3
In silico prediction of variant effects: promises and limitations for precision plant breeding.
Theor Appl Genet. 2025 Jul 28;138(8):193. doi: 10.1007/s00122-025-04973-1.
6
GOLF: A Generative AI Framework for Pathogenicity Prediction of Myocilin OLF Variants.
bioRxiv. 2025 Jun 24:2025.06.17.660210. doi: 10.1101/2025.06.17.660210.
7
Harnessing deep learning for proteome-scale detection of amyloid signaling motifs.
Bioinformatics. 2025 Jul 1;41(Supplement_1):i420-i428. doi: 10.1093/bioinformatics/btaf200.
8
Locality-aware pooling enhances protein language model performance across varied applications.
Bioinformatics. 2025 Jul 1;41(Supplement_1):i217-i226. doi: 10.1093/bioinformatics/btaf178.
9
Fine-Tuning Protein Language Models Unlocks the Potential of Underrepresented Viral Proteomes.
bioRxiv. 2025 Jun 11:2025.04.17.649224. doi: 10.1101/2025.04.17.649224.
10
Protein Structure-Function Relationship: A Kernel-PCA Approach for Reaction Coordinate Identification.
J Chem Theory Comput. 2025 Jul 22;21(14):7122-7130. doi: 10.1021/acs.jctc.5c00483. Epub 2025 Jul 14.

本文引用的文献

1
End-to-End Differentiable Learning of Protein Structure.
Cell Syst. 2019 Apr 24;8(4):292-301.e3. doi: 10.1016/j.cels.2019.03.006. Epub 2019 Apr 17.
2
Deep generative models of genetic variation capture the effects of mutations.
Nat Methods. 2018 Oct;15(10):816-822. doi: 10.1038/s41592-018-0138-4. Epub 2018 Sep 24.
3
De novo design of a fluorescence-activating β-barrel.
Nature. 2018 Sep;561(7724):485-491. doi: 10.1038/s41586-018-0509-0. Epub 2018 Sep 12.
5
Learned protein embeddings for machine learning.
Bioinformatics. 2018 Aug 1;34(15):2642-2648. doi: 10.1093/bioinformatics/bty178.
6
Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules.
ACS Cent Sci. 2018 Feb 28;4(2):268-276. doi: 10.1021/acscentsci.7b00572. Epub 2018 Jan 12.
7
Multiplexed gene synthesis in emulsions for exploring protein functional landscapes.
Science. 2018 Jan 19;359(6373):343-347. doi: 10.1126/science.aao5167. Epub 2018 Jan 4.
8
The Future of Multiplexed Eukaryotic Genome Engineering.
ACS Chem Biol. 2018 Feb 16;13(2):313-325. doi: 10.1021/acschembio.7b00842. Epub 2017 Dec 28.
9
Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data.
Cell Syst. 2018 Jan 24;6(1):116-124.e3. doi: 10.1016/j.cels.2017.11.003. Epub 2017 Dec 6.
10
Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization.
PLoS Comput Biol. 2017 Oct 23;13(10):e1005786. doi: 10.1371/journal.pcbi.1005786. eCollection 2017 Oct.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验