Suppr超能文献

adabmDCA:用于生物序列的自适应玻尔兹曼机学习。

adabmDCA: adaptive Boltzmann machine learning for biological sequences.

机构信息

Statistical Inference and Biological Modeling Group, Italian Institute for Genomic Medicine, Candiolo, Italy.

Department of Applied Science and Technology, Politecnico di Torino, Turin, Italy.

出版信息

BMC Bioinformatics. 2021 Oct 29;22(1):528. doi: 10.1186/s12859-021-04441-9.

Abstract

BACKGROUND

Boltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conservation, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain. More recently, the accuracy of these models has been also assessed in terms of their ability in predicting mutational effects and generating in silico functional sequences.

RESULTS

Our adaptive implementation of Boltzmann machine learning, adabmDCA, can be generally applied to both protein and RNA families and accomplishes several learning set-ups, depending on the complexity of the input data and on the user requirements. The code is fully available at https://github.com/anna-pa-m/adabmDCA . As an example, we have performed the learning of three Boltzmann machines modeling the Kunitz and Beta-lactamase2 protein domains and TPP-riboswitch RNA domain.

CONCLUSIONS

The models learned by adabmDCA are comparable to those obtained by state-of-the-art techniques for this task, in terms of the quality of the inferred contact map as well as of the synthetically generated sequences. In addition, the code implements both equilibrium and out-of-equilibrium learning, which allows for an accurate and lossless training when the equilibrium one is prohibitive in terms of computational time, and allows for pruning irrelevant parameters using an information-based criterion.

摘要

背景

玻尔兹曼机是一种基于能量的模型,已被证明能够为进化相关的蛋白质和 RNA 家族领域提供准确的统计描述。它们的参数化是基于局部偏差,这些偏差考虑了残基的保守性,以及对残基之间的上位协同进化进行建模的成对项。从模型参数中,可以准确预测目标域的三维接触图。最近,这些模型的准确性还通过它们预测突变效应和生成虚拟功能序列的能力来评估。

结果

我们的玻尔兹曼机自适应学习实现,adabmDCA,可以普遍应用于蛋白质和 RNA 家族,并根据输入数据的复杂性和用户要求完成几种学习设置。代码完全可以在 https://github.com/anna-pa-m/adabmDCA 上获得。例如,我们已经对三个建模 Kunitz 和 Beta-lactamase2 蛋白质域以及 TPP-riboswitch RNA 域的玻尔兹曼机进行了学习。

结论

adabmDCA 学习的模型在推断接触图的质量以及合成生成的序列方面,与该任务的最先进技术获得的模型相当。此外,该代码实现了平衡和非平衡学习,当平衡学习在计算时间方面受到限制时,可以进行准确和无损的训练,并使用基于信息的标准修剪不相关的参数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caac/8555268/89914fcde357/12859_2021_4441_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验