Suppr超能文献

adabmDCA:用于生物序列的自适应玻尔兹曼机学习。

adabmDCA: adaptive Boltzmann machine learning for biological sequences.

机构信息

Statistical Inference and Biological Modeling Group, Italian Institute for Genomic Medicine, Candiolo, Italy.

Department of Applied Science and Technology, Politecnico di Torino, Turin, Italy.

出版信息

BMC Bioinformatics. 2021 Oct 29;22(1):528. doi: 10.1186/s12859-021-04441-9.

Abstract

BACKGROUND

Boltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conservation, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain. More recently, the accuracy of these models has been also assessed in terms of their ability in predicting mutational effects and generating in silico functional sequences.

RESULTS

Our adaptive implementation of Boltzmann machine learning, adabmDCA, can be generally applied to both protein and RNA families and accomplishes several learning set-ups, depending on the complexity of the input data and on the user requirements. The code is fully available at https://github.com/anna-pa-m/adabmDCA . As an example, we have performed the learning of three Boltzmann machines modeling the Kunitz and Beta-lactamase2 protein domains and TPP-riboswitch RNA domain.

CONCLUSIONS

The models learned by adabmDCA are comparable to those obtained by state-of-the-art techniques for this task, in terms of the quality of the inferred contact map as well as of the synthetically generated sequences. In addition, the code implements both equilibrium and out-of-equilibrium learning, which allows for an accurate and lossless training when the equilibrium one is prohibitive in terms of computational time, and allows for pruning irrelevant parameters using an information-based criterion.

摘要

背景

玻尔兹曼机是一种基于能量的模型,已被证明能够为进化相关的蛋白质和 RNA 家族领域提供准确的统计描述。它们的参数化是基于局部偏差,这些偏差考虑了残基的保守性,以及对残基之间的上位协同进化进行建模的成对项。从模型参数中,可以准确预测目标域的三维接触图。最近,这些模型的准确性还通过它们预测突变效应和生成虚拟功能序列的能力来评估。

结果

我们的玻尔兹曼机自适应学习实现,adabmDCA,可以普遍应用于蛋白质和 RNA 家族,并根据输入数据的复杂性和用户要求完成几种学习设置。代码完全可以在 https://github.com/anna-pa-m/adabmDCA 上获得。例如,我们已经对三个建模 Kunitz 和 Beta-lactamase2 蛋白质域以及 TPP-riboswitch RNA 域的玻尔兹曼机进行了学习。

结论

adabmDCA 学习的模型在推断接触图的质量以及合成生成的序列方面,与该任务的最先进技术获得的模型相当。此外,该代码实现了平衡和非平衡学习,当平衡学习在计算时间方面受到限制时,可以进行准确和无损的训练,并使用基于信息的标准修剪不相关的参数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/caac/8555268/89914fcde357/12859_2021_4441_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验