adabmDCA：用于生物序列的自适应玻尔兹曼机学习。

adabmDCA: adaptive Boltzmann machine learning for biological sequences.

机构信息

Statistical Inference and Biological Modeling Group, Italian Institute for Genomic Medicine, Candiolo, Italy.

Department of Applied Science and Technology, Politecnico di Torino, Turin, Italy.

出版信息

BMC Bioinformatics. 2021 Oct 29;22(1):528. doi: 10.1186/s12859-021-04441-9.

DOI:10.1186/s12859-021-04441-9

PMID:34715775

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8555268/

Abstract

BACKGROUND

Boltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conservation, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain. More recently, the accuracy of these models has been also assessed in terms of their ability in predicting mutational effects and generating in silico functional sequences.

RESULTS

Our adaptive implementation of Boltzmann machine learning, adabmDCA, can be generally applied to both protein and RNA families and accomplishes several learning set-ups, depending on the complexity of the input data and on the user requirements. The code is fully available at https://github.com/anna-pa-m/adabmDCA . As an example, we have performed the learning of three Boltzmann machines modeling the Kunitz and Beta-lactamase2 protein domains and TPP-riboswitch RNA domain.

CONCLUSIONS

The models learned by adabmDCA are comparable to those obtained by state-of-the-art techniques for this task, in terms of the quality of the inferred contact map as well as of the synthetically generated sequences. In addition, the code implements both equilibrium and out-of-equilibrium learning, which allows for an accurate and lossless training when the equilibrium one is prohibitive in terms of computational time, and allows for pruning irrelevant parameters using an information-based criterion.

摘要

背景

玻尔兹曼机是一种基于能量的模型，已被证明能够为进化相关的蛋白质和 RNA 家族领域提供准确的统计描述。它们的参数化是基于局部偏差，这些偏差考虑了残基的保守性，以及对残基之间的上位协同进化进行建模的成对项。从模型参数中，可以准确预测目标域的三维接触图。最近，这些模型的准确性还通过它们预测突变效应和生成虚拟功能序列的能力来评估。

结果

我们的玻尔兹曼机自适应学习实现，adabmDCA，可以普遍应用于蛋白质和 RNA 家族，并根据输入数据的复杂性和用户要求完成几种学习设置。代码完全可以在 https://github.com/anna-pa-m/adabmDCA 上获得。例如，我们已经对三个建模 Kunitz 和 Beta-lactamase2 蛋白质域以及 TPP-riboswitch RNA 域的玻尔兹曼机进行了学习。

结论

adabmDCA 学习的模型在推断接触图的质量以及合成生成的序列方面，与该任务的最先进技术获得的模型相当。此外，该代码实现了平衡和非平衡学习，当平衡学习在计算时间方面受到限制时，可以进行准确和无损的训练，并使用基于信息的标准修剪不相关的参数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caac/8555268/89914fcde357/12859_2021_4441_Fig1_HTML.jpg

相似文献

adabmDCA: adaptive Boltzmann machine learning for biological sequences.adabmDCA：用于生物序列的自适应玻尔兹曼机学习。

BMC Bioinformatics. 2021 Oct 29;22(1):528. doi: 10.1186/s12859-021-04441-9.

Generative Modeling of RNA Sequence Families with Restricted Boltzmann Machines.用受限玻尔兹曼机对 RNA 序列家族进行生成建模。

Methods Mol Biol. 2025;2847:163-175. doi: 10.1007/978-1-0716-4079-1_11.

Sparse generative modeling via parameter reduction of Boltzmann machines: Application to protein-sequence families.通过玻尔兹曼机的参数约简进行稀疏生成建模：在蛋白质序列家族中的应用。

Phys Rev E. 2021 Aug;104(2-1):024407. doi: 10.1103/PhysRevE.104.024407.

ACE: adaptive cluster expansion for maximum entropy graphical model inference.ACE：用于最大熵图形模型推断的自适应聚类扩展。

Bioinformatics. 2016 Oct 15;32(20):3089-3097. doi: 10.1093/bioinformatics/btw328. Epub 2016 Jun 21.

ProNA2020 predicts protein-DNA, protein-RNA, and protein-protein binding proteins and residues from sequence.ProNA2020 可从序列预测蛋白质-DNA、蛋白质-RNA 和蛋白质-蛋白质结合蛋白及残基。

J Mol Biol. 2020 Mar 27;432(7):2428-2443. doi: 10.1016/j.jmb.2020.02.026. Epub 2020 Mar 4.

Learning protein constitutive motifs from sequence data.从序列数据中学习蛋白质组成基序。

Elife. 2019 Mar 12;8:e39397. doi: 10.7554/eLife.39397.

Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.基于数据驱动的血糖动力学建模与预测：机器学习在 1 型糖尿病中的应用。

Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26.

Computational design of novel Cas9 PAM-interacting domains using evolution-based modelling and structural quality assessment.基于进化建模和结构质量评估的新型 Cas9 PAM 相互作用结构域的计算设计。

PLoS Comput Biol. 2023 Nov 17;19(11):e1011621. doi: 10.1371/journal.pcbi.1011621. eCollection 2023 Nov.

Deep learning methods for protein torsion angle prediction.用于蛋白质扭转角预测的深度学习方法。

BMC Bioinformatics. 2017 Sep 18;18(1):417. doi: 10.1186/s12859-017-1834-2.

Evolutionary couplings and sequence variation effect predict protein binding sites.进化偶联和序列变异效应可预测蛋白质结合位点。

Proteins. 2018 Oct;86(10):1064-1074. doi: 10.1002/prot.25585. Epub 2018 Oct 22.

引用本文的文献

Integrating experimental feedback improves generative models for biological sequences.整合实验反馈可改进生物序列生成模型。

Nucleic Acids Res. 2025 Aug 27;53(16). doi: 10.1093/nar/gkaf832.

Overview and Prospects of DNA Sequence Visualization.DNA序列可视化概述与展望

Int J Mol Sci. 2025 Jan 8;26(2):477. doi: 10.3390/ijms26020477.

Understanding epistatic networks in the B1 β-lactamases through coevolutionary statistical modeling and deep mutational scanning.通过共进化统计建模和深度突变扫描来理解 B1 内酰胺酶中的上位网络。

Nat Commun. 2024 Sep 30;15(1):8441. doi: 10.1038/s41467-024-52614-w.

Emergent time scales of epistasis in protein evolution.蛋白质进化中突现的互作时间尺度。

Proc Natl Acad Sci U S A. 2024 Oct;121(40):e2406807121. doi: 10.1073/pnas.2406807121. Epub 2024 Sep 26.

Towards parsimonious generative modeling of RNA families.RNA 家族生成模型的简约化研究。

Nucleic Acids Res. 2024 Jun 10;52(10):5465-5477. doi: 10.1093/nar/gkae289.

In vivo functional phenotypes from a computational epistatic model of evolution.从进化的计算上位性模型中得出的体内功能表型。

Proc Natl Acad Sci U S A. 2024 Feb 6;121(6):e2308895121. doi: 10.1073/pnas.2308895121. Epub 2024 Jan 29.

GENERALIST: A latent space based generative model for protein sequence families.通用：基于潜在空间的蛋白质序列家族生成模型。

PLoS Comput Biol. 2023 Nov 27;19(11):e1011655. doi: 10.1371/journal.pcbi.1011655. eCollection 2023 Nov.

DCAlign v1.0: aligning biological sequences using co-evolution models and informed priors.DCAlign v1.0：使用共进化模型和信息先验对齐生物序列。

Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad537.

Funneling modulatory peptide design with generative models: Discovery and characterization of disruptors of calcineurin protein-protein interactions.使用生成模型进行调节肽设计：钙调神经磷酸酶蛋白-蛋白相互作用抑制剂的发现和表征。

PLoS Comput Biol. 2023 Feb 2;19(2):e1010874. doi: 10.1371/journal.pcbi.1010874. eCollection 2023 Feb.

本文引用的文献

Phys Rev E. 2021 Aug;104(2-1):024407. doi: 10.1103/PhysRevE.104.024407.

Accurate prediction of protein structures and interactions using a three-track neural network.使用三轨神经网络准确预测蛋白质结构和相互作用。

Science. 2021 Aug 20;373(6557):871-876. doi: 10.1126/science.abj8754. Epub 2021 Jul 15.

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

Mi3-GPU: MCMC-based Inverse Ising Inference on GPUs for protein covariation analysis.Mi3-GPU：用于蛋白质共变分析的基于MCMC的GPU上的逆伊辛推理

Comput Phys Commun. 2021 Mar;260. doi: 10.1016/j.cpc.2020.107312. Epub 2020 Apr 17.

Rfam 14: expanded coverage of metagenomic, viral and microRNA families.Rfam 14：扩展了对宏基因组、病毒和 miRNA 家族的覆盖范围。

Nucleic Acids Res. 2021 Jan 8;49(D1):D192-D200. doi: 10.1093/nar/gkaa1047.

Pfam: The protein families database in 2021.Pfam：2021 年的蛋白质家族数据库。

Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419. doi: 10.1093/nar/gkaa913.

An evolution-based model for designing chorismate mutase enzymes.一种基于进化的分支酸变位酶设计模型。

Science. 2020 Jul 24;369(6502):440-445. doi: 10.1126/science.aba3304.

Assessing the accuracy of direct-coupling analysis for RNA contact prediction.评估直接耦联分析在 RNA 接触预测中的准确性。

RNA. 2020 May;26(5):637-647. doi: 10.1261/rna.074179.119. Epub 2020 Feb 27.

Improved protein structure prediction using potentials from deep learning.利用深度学习势进行蛋白质结构预测的改进。

Nature. 2020 Jan;577(7792):706-710. doi: 10.1038/s41586-019-1923-7. Epub 2020 Jan 15.

Improved protein structure prediction using predicted interresidue orientations.利用预测的残基间取向改进蛋白质结构预测。

Proc Natl Acad Sci U S A. 2020 Jan 21;117(3):1496-1503. doi: 10.1073/pnas.1914677117. Epub 2020 Jan 2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

adabmDCA：用于生物序列的自适应玻尔兹曼机学习。

adabmDCA: adaptive Boltzmann machine learning for biological sequences.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献