• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过玻尔兹曼机的参数约简进行稀疏生成建模:在蛋白质序列家族中的应用。

Sparse generative modeling via parameter reduction of Boltzmann machines: Application to protein-sequence families.

作者信息

Barrat-Charlaix Pierre, Muntoni Anna Paola, Shimagaki Kai, Weigt Martin, Zamponi Francesco

机构信息

Biozentrum, Universität Basel, Switzerland, Swiss Institute of Bioinformatics, Basel 4056, Switzerland.

Department of Applied Science and Technology (DISAT), Politecnico di Torino, Corso Duca degli Abruzzi 24, Torino 10129, Italy.

出版信息

Phys Rev E. 2021 Aug;104(2-1):024407. doi: 10.1103/PhysRevE.104.024407.

DOI:10.1103/PhysRevE.104.024407
PMID:34525554
Abstract

Boltzmann machines (BMs) are widely used as generative models. For example, pairwise Potts models (PMs), which are instances of the BM class, provide accurate statistical models of families of evolutionarily related protein sequences. Their parameters are the local fields, which describe site-specific patterns of amino acid conservation, and the two-site couplings, which mirror the coevolution between pairs of sites. This coevolution reflects structural and functional constraints acting on protein sequences during evolution. The most conservative choice to describe the coevolution signal is to include all possible two-site couplings into the PM. This choice, typical of what is known as Direct Coupling Analysis, has been successful for predicting residue contacts in the three-dimensional structure, mutational effects, and generating new functional sequences. However, the resulting PM suffers from important overfitting effects: many couplings are small, noisy, and hardly interpretable; the PM is close to a critical point, meaning that it is highly sensitive to small parameter perturbations. In this work, we introduce a general parameter-reduction procedure for BMs, via a controlled iterative decimation of the less statistically significant couplings, identified by an information-based criterion that selects either weak or statistically unsupported couplings. For several protein families, our procedure allows one to remove more than 90% of the PM couplings, while preserving the predictive and generative properties of the original dense PM, and the resulting model is far away from criticality, hence more robust to noise.

摘要

玻尔兹曼机(BMs)被广泛用作生成模型。例如,作为BM类实例的成对Potts模型(PMs),为进化相关蛋白质序列家族提供了准确的统计模型。其参数是局部场,用于描述氨基酸保守性的位点特异性模式,以及两位点耦合,反映位点对之间的协同进化。这种协同进化反映了进化过程中作用于蛋白质序列的结构和功能限制。描述协同进化信号最保守的选择是将所有可能的两位点耦合纳入PM。这种选择是所谓直接耦合分析的典型做法,在预测三维结构中的残基接触、突变效应以及生成新的功能序列方面取得了成功。然而,由此产生的PM存在重要的过拟合效应:许多耦合很小、有噪声且难以解释;PM接近临界点,这意味着它对小的参数扰动高度敏感。在这项工作中,我们通过对统计意义较小的耦合进行受控迭代抽取,为BMs引入了一种通用的参数约简程序,该程序由一种基于信息的标准确定,该标准选择弱耦合或统计上无支持的耦合。对于几个蛋白质家族,我们的程序允许去除超过90%的PM耦合,同时保留原始密集PM的预测和生成特性,并且得到的模型远离临界点,因此对噪声更具鲁棒性。

相似文献

1
Sparse generative modeling via parameter reduction of Boltzmann machines: Application to protein-sequence families.通过玻尔兹曼机的参数约简进行稀疏生成建模:在蛋白质序列家族中的应用。
Phys Rev E. 2021 Aug;104(2-1):024407. doi: 10.1103/PhysRevE.104.024407.
2
Selection of sequence motifs and generative Hopfield-Potts models for protein families.蛋白质家族的序列基序选择和生成型 Hopfield-Potts 模型。
Phys Rev E. 2019 Sep;100(3-1):032128. doi: 10.1103/PhysRevE.100.032128.
3
adabmDCA: adaptive Boltzmann machine learning for biological sequences.adabmDCA:用于生物序列的自适应玻尔兹曼机学习。
BMC Bioinformatics. 2021 Oct 29;22(1):528. doi: 10.1186/s12859-021-04441-9.
4
PPalign: optimal alignment of Potts models representing proteins with direct coupling information.PPalign:具有直接耦合信息的 Potts 模型代表蛋白质的最佳对齐。
BMC Bioinformatics. 2021 Jun 10;22(1):317. doi: 10.1186/s12859-021-04222-4.
5
How Pairwise Coevolutionary Models Capture the Collective Residue Variability in Proteins?成对协同进化模型如何捕捉蛋白质中的集体残基变异性?
Mol Biol Evol. 2018 Apr 1;35(4):1018-1027. doi: 10.1093/molbev/msy007.
6
Boltzmann Machine Learning and Regularization Methods for Inferring Evolutionary Fields and Couplings From a Multiple Sequence Alignment.玻尔兹曼机学习与正则化方法在从多重序列比对推断进化场与耦合中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):328-342. doi: 10.1109/TCBB.2020.2993232. Epub 2022 Feb 3.
7
Generative power of a protein language model trained on multiple sequence alignments.基于多序列比对训练的蛋白质语言模型的生成能力。
Elife. 2023 Feb 3;12:e79854. doi: 10.7554/eLife.79854.
8
On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins.在基于共同进化的蛋白质接触预测中,系统发育相关性的影响。
PLoS Comput Biol. 2021 May 24;17(5):e1008957. doi: 10.1371/journal.pcbi.1008957. eCollection 2021 May.
9
Graphical models of residue coupling in protein families.蛋白质家族中残基偶联的图形模型。
IEEE/ACM Trans Comput Biol Bioinform. 2008 Apr-Jun;5(2):183-97. doi: 10.1109/TCBB.2007.70225.
10
Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models.蛋白质中改进的接触预测:使用伪似然性推断Potts模型。
Phys Rev E Stat Nonlin Soft Matter Phys. 2013 Jan;87(1):012707. doi: 10.1103/PhysRevE.87.012707. Epub 2013 Jan 11.

引用本文的文献

1
Direct coupling analysis and the attention mechanism.直接耦合分析与注意力机制。
BMC Bioinformatics. 2025 Feb 6;26(1):41. doi: 10.1186/s12859-025-06062-y.
2
Emergent time scales of epistasis in protein evolution.蛋白质进化中突现的互作时间尺度。
Proc Natl Acad Sci U S A. 2024 Oct;121(40):e2406807121. doi: 10.1073/pnas.2406807121. Epub 2024 Sep 26.
3
Towards parsimonious generative modeling of RNA families.RNA 家族生成模型的简约化研究。
Nucleic Acids Res. 2024 Jun 10;52(10):5465-5477. doi: 10.1093/nar/gkae289.
4
GENERALIST: A latent space based generative model for protein sequence families.通用:基于潜在空间的蛋白质序列家族生成模型。
PLoS Comput Biol. 2023 Nov 27;19(11):e1011655. doi: 10.1371/journal.pcbi.1011655. eCollection 2023 Nov.
5
Infer global, predict local: Quantity-relevance trade-off in protein fitness predictions from sequence data.从序列数据推断全局,预测局部:蛋白质适应性预测中的数量-相关性权衡。
PLoS Comput Biol. 2023 Oct 26;19(10):e1011521. doi: 10.1371/journal.pcbi.1011521. eCollection 2023 Oct.
6
Inferring couplings in networks across order-disorder phase transitions.推断跨越有序-无序相变的网络中的耦合。
Phys Rev Res. 2022 Jun-Aug;4(2). doi: 10.1103/physrevresearch.4.023240. Epub 2022 Jun 24.
7
Unsupervised Bayesian Ising Approximation for decoding neural activity and other biological dictionaries.无监督贝叶斯伊辛近似在解码神经活动和其他生物词典中的应用。
Elife. 2022 Mar 22;11:e68192. doi: 10.7554/eLife.68192.
8
Modeling Sequence-Space Exploration and Emergence of Epistatic Signals in Protein Evolution.蛋白质进化中序列空间探索和上位信号涌现的建模。
Mol Biol Evol. 2022 Jan 7;39(1). doi: 10.1093/molbev/msab321.
9
adabmDCA: adaptive Boltzmann machine learning for biological sequences.adabmDCA:用于生物序列的自适应玻尔兹曼机学习。
BMC Bioinformatics. 2021 Oct 29;22(1):528. doi: 10.1186/s12859-021-04441-9.