• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用自回归生成模型进行蛋白质设计和变体预测。

Protein design and variant prediction using autoregressive generative models.

机构信息

Department of Systems Biology, Harvard Medical School, Boston, MA, USA.

insitro, South San Francisco, CA, USA.

出版信息

Nat Commun. 2021 Apr 23;12(1):2403. doi: 10.1038/s41467-021-22732-w.

DOI:10.1038/s41467-021-22732-w
PMID:33893299
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8065141/
Abstract

The ability to design functional sequences and predict effects of variation is central to protein engineering and biotherapeutics. State-of-art computational methods rely on models that leverage evolutionary information but are inadequate for important applications where multiple sequence alignments are not robust. Such applications include the prediction of variant effects of indels, disordered proteins, and the design of proteins such as antibodies due to the highly variable complementarity determining regions. We introduce a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments. The model performs state-of-art prediction of missense and indel effects and we successfully design and test a diverse 10-nanobody library that shows better expression than a 1000-fold larger synthetic library. Our results demonstrate the power of the alignment-free autoregressive model in generalizing to regions of sequence space traditionally considered beyond the reach of prediction and design.

摘要

设计功能序列和预测变异影响的能力是蛋白质工程和生物治疗的核心。最先进的计算方法依赖于利用进化信息的模型,但对于某些重要应用来说并不足够,因为这些应用中多序列比对并不稳健。此类应用包括插入缺失、无序蛋白质的变异影响预测,以及由于高度可变的互补决定区而导致的抗体等蛋白质的设计。我们引入了一种源自自然语言处理的深度生成模型,用于在无需比对的情况下预测和设计多样化的功能序列。该模型在预测错义突变和插入缺失影响方面表现出色,我们成功设计并测试了一个多样化的 10 纳米抗体文库,其表达水平优于大 1000 倍的合成文库。我们的结果证明了无比对自回归模型在序列空间的传统认为难以预测和设计的区域进行泛化的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56f2/8065141/e17051a4bf29/41467_2021_22732_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56f2/8065141/875eec8e7fdf/41467_2021_22732_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56f2/8065141/556f753423a5/41467_2021_22732_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56f2/8065141/bf7c4688c702/41467_2021_22732_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56f2/8065141/e17051a4bf29/41467_2021_22732_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56f2/8065141/875eec8e7fdf/41467_2021_22732_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56f2/8065141/556f753423a5/41467_2021_22732_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56f2/8065141/bf7c4688c702/41467_2021_22732_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56f2/8065141/e17051a4bf29/41467_2021_22732_Fig4_HTML.jpg

相似文献

1
Protein design and variant prediction using autoregressive generative models.使用自回归生成模型进行蛋白质设计和变体预测。
Nat Commun. 2021 Apr 23;12(1):2403. doi: 10.1038/s41467-021-22732-w.
2
Can computationally designed protein sequences improve secondary structure prediction?计算设计的蛋白质序列能否提高二级结构预测?
Protein Eng Des Sel. 2011 May;24(5):455-61. doi: 10.1093/protein/gzr003. Epub 2011 Jan 31.
3
Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space.原子水平蛋白质设计的准确预测及其在扩展近最优序列空间中的应用。
Proteins. 2009 May 15;75(3):682-705. doi: 10.1002/prot.22280.
4
Simplified synthetic antibody libraries.简化的合成抗体文库。
Methods Enzymol. 2012;502:3-23. doi: 10.1016/B978-0-12-416039-2.00001-X.
5
An integrative approach to protein sequence design through multiobjective optimization.通过多目标优化进行蛋白质序列设计的综合方法。
PLoS Comput Biol. 2024 Jul 11;20(7):e1011953. doi: 10.1371/journal.pcbi.1011953. eCollection 2024 Jul.
6
Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles.通过具有基于片段的局部特征和基于能量的非局部特征的神经网络直接预测与蛋白质结构兼容的序列特征。
Proteins. 2014 Oct;82(10):2565-73. doi: 10.1002/prot.24620. Epub 2014 Jun 19.
7
Enhancing missense variant pathogenicity prediction with protein language models using VariPred.利用 VariPred 利用蛋白质语言模型增强错义变异致病性预测。
Sci Rep. 2024 Apr 7;14(1):8136. doi: 10.1038/s41598-024-51489-7.
8
High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.利用全卷积神经网络和最小序列特征进行高精度蛋白质接触预测。
Bioinformatics. 2018 Oct 1;34(19):3308-3315. doi: 10.1093/bioinformatics/bty341.
9
Systematic analysis of short internal indels and their impact on protein folding.短内部插入缺失及其对蛋白质折叠影响的系统分析。
BMC Struct Biol. 2010 Aug 4;10:24. doi: 10.1186/1472-6807-10-24.
10
The Framework of Computational Protein Design.计算蛋白质设计框架
Methods Mol Biol. 2017;1529:3-19. doi: 10.1007/978-1-4939-6637-0_1.

引用本文的文献

1
Rational protein engineering using an omni-directional multipoint mutagenesis generation pipeline.利用全向多点诱变生成流程进行合理的蛋白质工程。
iScience. 2025 Aug 5;28(9):113273. doi: 10.1016/j.isci.2025.113273. eCollection 2025 Sep 19.
2
Applications of Artificial Intelligence in Biotech Drug Discovery and Product Development.人工智能在生物技术药物发现与产品开发中的应用。
MedComm (2020). 2025 Jul 30;6(8):e70317. doi: 10.1002/mco2.70317. eCollection 2025 Aug.
3
EvoNB: A protein language model-based workflow for nanobody mutation prediction and optimization.

本文引用的文献

1
Rapid generation of potent antibodies by autonomous hypermutation in yeast.酵母自主超突变快速产生有效抗体。
Nat Chem Biol. 2021 Oct;17(10):1057-1064. doi: 10.1038/s41589-021-00832-4. Epub 2021 Jun 24.
2
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
3
Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences.
EvoNB:一种基于蛋白质语言模型的纳米抗体突变预测与优化工作流程。
J Pharm Anal. 2025 Jun;15(6):101260. doi: 10.1016/j.jpha.2025.101260. Epub 2025 Mar 10.
4
Synergizing Attribute-Guided Latent Space Exploration (AGLSE) with Classical Molecular Simulations to Design Potent Pep-Magnet Peptide Inhibitors to Abrogate SARS-CoV-2 Host Cell Entry.将属性引导的潜在空间探索(AGLSE)与经典分子模拟相结合,以设计有效的 Pep-Magnet 肽抑制剂来阻断 SARS-CoV-2 进入宿主细胞。
Viruses. 2025 Jun 7;17(6):828. doi: 10.3390/v17060828.
5
Developing drug-like single-domain antibodies (VHH) from in vitro libraries.从体外文库开发类药物单域抗体(VHH)
MAbs. 2025 Dec;17(1):2516676. doi: 10.1080/19420862.2025.2516676. Epub 2025 Jun 25.
6
Ultrafast classical phylogenetic method beats large protein language models on variant effect prediction.超快经典系统发育方法在变异效应预测方面胜过大型蛋白质语言模型。
Adv Neural Inf Process Syst. 2024;37:130265-130290.
7
Nanobodies: From Discovery to AI-Driven Design.纳米抗体:从发现到人工智能驱动的设计
Biology (Basel). 2025 May 14;14(5):547. doi: 10.3390/biology14050547.
8
Semantical and geometrical protein encoding toward enhanced bioactivity and thermostability.面向增强生物活性和热稳定性的语义和几何蛋白质编码
Elife. 2025 May 2;13:RP98033. doi: 10.7554/eLife.98033.
9
Variant effect predictor correlation with functional assays is reflective of clinical classification performance.变异效应预测器与功能测定的相关性反映了临床分类性能。
Genome Biol. 2025 Apr 22;26(1):104. doi: 10.1186/s13059-025-03575-w.
10
Revolutionizing oncology: the role of Artificial Intelligence (AI) as an antibody design, and optimization tools.肿瘤学的变革:人工智能(AI)作为抗体设计与优化工具的作用。
Biomark Res. 2025 Mar 29;13(1):52. doi: 10.1186/s40364-025-00764-4.
利用基因组序列计算的共进化进行残基分辨率下的大规模蛋白质相互作用发现。
Nat Commun. 2021 Mar 2;12(1):1396. doi: 10.1038/s41467-021-21636-z.
4
Evaluating Protein Transfer Learning with TAPE.使用TAPE评估蛋白质迁移学习。
Adv Neural Inf Process Syst. 2019 Dec;32:9689-9701.
5
A guide to: generation and design of nanobodies.纳米抗体的生成与设计指南
FEBS J. 2021 Apr;288(7):2084-2102. doi: 10.1111/febs.15515. Epub 2020 Aug 28.
6
A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences.用于最大化合成DNA和蛋白质序列适应性与多样性的生成神经网络。
Cell Syst. 2020 Jul 22;11(1):49-62.e16. doi: 10.1016/j.cels.2020.05.007. Epub 2020 Jun 25.
7
Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations.利用深度突变扫描对变异效应预测器进行基准测试,并识别疾病突变。
Mol Syst Biol. 2020 Jul;16(7):e9380. doi: 10.15252/msb.20199380.
8
How repertoire data are changing antibody science.抗体科学如何因库数据而改变。
J Biol Chem. 2020 Jul 17;295(29):9823-9837. doi: 10.1074/jbc.REV120.010181. Epub 2020 May 14.
9
Pan-cancer analysis of whole genomes.泛癌症全基因组分析。
Nature. 2020 Feb;578(7793):82-93. doi: 10.1038/s41586-020-1969-6. Epub 2020 Feb 5.
10
UDSMProt: universal deep sequence models for protein classification.UDSMProt:用于蛋白质分类的通用深度序列模型。
Bioinformatics. 2020 Apr 15;36(8):2401-2409. doi: 10.1093/bioinformatics/btaa003.