• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

概率蛋白质序列模型的生成能力。

The generative capacity of probabilistic protein sequence models.

机构信息

Center for Biophysics and Computational Biology, Temple University, Philadelphia, 19122, USA.

Institute for Computational Molecular Science, Temple University, Philadelphia, 19122, USA.

出版信息

Nat Commun. 2021 Nov 2;12(1):6302. doi: 10.1038/s41467-021-26529-9.

DOI:10.1038/s41467-021-26529-9
PMID:34728624
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8563988/
Abstract

Potts models and variational autoencoders (VAEs) have recently gained popularity as generative protein sequence models (GPSMs) to explore fitness landscapes and predict mutation effects. Despite encouraging results, current model evaluation metrics leave unclear whether GPSMs faithfully reproduce the complex multi-residue mutational patterns observed in natural sequences due to epistasis. Here, we develop a set of sequence statistics to assess the "generative capacity" of three current GPSMs: the pairwise Potts Hamiltonian, the VAE, and the site-independent model. We show that the Potts model's generative capacity is largest, as the higher-order mutational statistics generated by the model agree with those observed for natural sequences, while the VAE's lies between the Potts and site-independent models. Importantly, our work provides a new framework for evaluating and interpreting GPSM accuracy which emphasizes the role of higher-order covariation and epistasis, with broader implications for probabilistic sequence models in general.

摘要

泊松模型和变分自动编码器 (VAEs) 最近作为生成蛋白质序列模型 (GPSMs) 受到关注,用于探索适应度景观和预测突变效应。尽管取得了令人鼓舞的结果,但由于上位性,当前的模型评估指标仍不清楚 GPSMs 是否忠实地再现了自然序列中观察到的复杂多残基突变模式。在这里,我们开发了一组序列统计数据来评估三种当前 GPSMs 的“生成能力”:成对泊松哈密顿量、变分自动编码器和独立于位置的模型。我们表明,泊松模型的生成能力最大,因为模型生成的高阶突变统计数据与自然序列中观察到的一致,而变分自动编码器的则介于泊松和独立于位置的模型之间。重要的是,我们的工作为评估和解释 GPSM 准确性提供了一个新的框架,该框架强调了高阶协变和上位性的作用,对一般的概率序列模型具有更广泛的意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54eb/8563988/152e001ceaa6/41467_2021_26529_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54eb/8563988/9d31c190eea4/41467_2021_26529_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54eb/8563988/d7729ab01273/41467_2021_26529_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54eb/8563988/a7fa990a5c6d/41467_2021_26529_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54eb/8563988/6a85e281cf25/41467_2021_26529_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54eb/8563988/152e001ceaa6/41467_2021_26529_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54eb/8563988/9d31c190eea4/41467_2021_26529_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54eb/8563988/d7729ab01273/41467_2021_26529_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54eb/8563988/a7fa990a5c6d/41467_2021_26529_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54eb/8563988/6a85e281cf25/41467_2021_26529_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/54eb/8563988/152e001ceaa6/41467_2021_26529_Fig5_HTML.jpg

相似文献

1
The generative capacity of probabilistic protein sequence models.概率蛋白质序列模型的生成能力。
Nat Commun. 2021 Nov 2;12(1):6302. doi: 10.1038/s41467-021-26529-9.
2
Influence of multiple-sequence-alignment depth on Potts statistical models of protein covariation.多序列比对深度对蛋白质共变的 Potts 统计模型的影响。
Phys Rev E. 2019 Mar;99(3-1):032405. doi: 10.1103/PhysRevE.99.032405.
3
Selection of sequence motifs and generative Hopfield-Potts models for protein families.蛋白质家族的序列基序选择和生成型 Hopfield-Potts 模型。
Phys Rev E. 2019 Sep;100(3-1):032128. doi: 10.1103/PhysRevE.100.032128.
4
Efficient generative modeling of protein sequences using simple autoregressive models.使用简单自回归模型高效生成蛋白质序列。
Nat Commun. 2021 Oct 4;12(1):5800. doi: 10.1038/s41467-021-25756-4.
5
Learning generative models for protein fold families.学习蛋白质折叠家族的生成模型。
Proteins. 2011 Apr;79(4):1061-78. doi: 10.1002/prot.22934. Epub 2011 Jan 25.
6
Generating functional protein variants with variational autoencoders.利用变分自动编码器生成功能性蛋白质变体。
PLoS Comput Biol. 2021 Feb 26;17(2):e1008736. doi: 10.1371/journal.pcbi.1008736. eCollection 2021 Feb.
7
Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models.使用精确可解模型对蛋白质结构和设计的逆统计方法进行基准测试。
PLoS Comput Biol. 2016 May 13;12(5):e1004889. doi: 10.1371/journal.pcbi.1004889. eCollection 2016 May.
8
Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness.蛋白质共变的Potts哈密顿模型、自由能景观和进化适应性。
Curr Opin Struct Biol. 2017 Apr;43:55-62. doi: 10.1016/j.sbi.2016.11.004. Epub 2016 Nov 18.
9
Remote homology search with hidden Potts models.使用隐式 Potts 模型进行远程同源搜索。
PLoS Comput Biol. 2020 Nov 30;16(11):e1008085. doi: 10.1371/journal.pcbi.1008085. eCollection 2020 Nov.
10
A Generative Angular Model of Protein Structure Evolution.蛋白质结构进化的生成性角度模型。
Mol Biol Evol. 2017 Aug 1;34(8):2085-2100. doi: 10.1093/molbev/msx137.

引用本文的文献

1
Integrating experimental feedback improves generative models for biological sequences.整合实验反馈可改进生物序列生成模型。
Nucleic Acids Res. 2025 Aug 27;53(16). doi: 10.1093/nar/gkaf832.
2
Protein Structural Phylogenetics.蛋白质结构系统发育学
Genome Biol Evol. 2025 Jul 30;17(8). doi: 10.1093/gbe/evaf139.
3
Phylogenetic Corrections and Higher-Order Sequence Statistics in Protein Families: The Potts Model vs MSA Transformer.蛋白质家族中的系统发育校正和高阶序列统计:Potts模型与多序列比对变换器

本文引用的文献

1
Efficient generative modeling of protein sequences using simple autoregressive models.使用简单自回归模型高效生成蛋白质序列。
Nat Commun. 2021 Oct 4;12(1):5800. doi: 10.1038/s41467-021-25756-4.
2
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
3
Mi3-GPU: MCMC-based Inverse Ising Inference on GPUs for protein covariation analysis.
ArXiv. 2025 Mar 1:arXiv:2503.00289v1.
4
Reconstruction of Ancestral Protein Sequences Using Autoregressive Generative Models.使用自回归生成模型重建祖先蛋白质序列
Mol Biol Evol. 2025 Apr 1;42(4). doi: 10.1093/molbev/msaf070.
5
Engineering Dehalogenase Enzymes Using Variational Autoencoder-Generated Latent Spaces and Microfluidics.利用变分自编码器生成的潜在空间和微流体技术设计脱卤酶
JACS Au. 2025 Feb 13;5(2):838-850. doi: 10.1021/jacsau.4c01101. eCollection 2025 Feb 24.
6
Entrenchment and contingency in neutral protein evolution with epistasis.中性蛋白质进化中上位性的固定与偶然性
bioRxiv. 2025 Jan 14:2025.01.09.632266. doi: 10.1101/2025.01.09.632266.
7
Potts Hamiltonian Models and Molecular Dynamics Free Energy Simulations for Predicting the Impact of Mutations on Protein Kinase Stability.波茨哈密顿模型和分子动力学自由能模拟预测突变对蛋白激酶稳定性的影响。
J Phys Chem B. 2024 Feb 22;128(7):1656-1667. doi: 10.1021/acs.jpcb.3c08097. Epub 2024 Feb 13.
8
pycofitness-Evaluating the fitness landscape of RNA and protein sequences.pycofitness—评估 RNA 和蛋白质序列的适应性景观。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae074.
9
In vivo functional phenotypes from a computational epistatic model of evolution.从进化的计算上位性模型中得出的体内功能表型。
Proc Natl Acad Sci U S A. 2024 Feb 6;121(6):e2308895121. doi: 10.1073/pnas.2308895121. Epub 2024 Jan 29.
10
GENERALIST: A latent space based generative model for protein sequence families.通用:基于潜在空间的蛋白质序列家族生成模型。
PLoS Comput Biol. 2023 Nov 27;19(11):e1011655. doi: 10.1371/journal.pcbi.1011655. eCollection 2023 Nov.
Mi3-GPU:用于蛋白质共变分析的基于MCMC的GPU上的逆伊辛推理
Comput Phys Commun. 2021 Mar;260. doi: 10.1016/j.cpc.2020.107312. Epub 2020 Apr 17.
4
Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations.通过深度生成模型和分子动力学模拟加速抗菌药物的发现。
Nat Biomed Eng. 2021 Jun;5(6):613-623. doi: 10.1038/s41551-021-00689-x. Epub 2021 Mar 11.
5
Generating functional protein variants with variational autoencoders.利用变分自动编码器生成功能性蛋白质变体。
PLoS Comput Biol. 2021 Feb 26;17(2):e1008736. doi: 10.1371/journal.pcbi.1008736. eCollection 2021 Feb.
6
Evaluating Protein Transfer Learning with TAPE.使用TAPE评估蛋白质迁移学习。
Adv Neural Inf Process Syst. 2019 Dec;32:9689-9701.
7
Fast and Flexible Protein Design Using Deep Graph Neural Networks.利用深度图神经网络实现快速灵活的蛋白质设计。
Cell Syst. 2020 Oct 21;11(4):402-411.e4. doi: 10.1016/j.cels.2020.08.016. Epub 2020 Sep 23.
8
Array programming with NumPy.使用 NumPy 进行数组编程。
Nature. 2020 Sep;585(7825):357-362. doi: 10.1038/s41586-020-2649-2. Epub 2020 Sep 16.
9
An evolution-based model for designing chorismate mutase enzymes.一种基于进化的分支酸变位酶设计模型。
Science. 2020 Jul 24;369(6502):440-445. doi: 10.1126/science.aba3304.
10
Deconvolving mutational patterns of poliovirus outbreaks reveals its intrinsic fitness landscape.解析脊灰病毒爆发的突变模式揭示了其内在适应度景观。
Nat Commun. 2020 Jan 17;11(1):377. doi: 10.1038/s41467-019-14174-2.