Suppr超能文献

玻尔兹曼机学习与正则化方法在从多重序列比对推断进化场与耦合中的应用。

Boltzmann Machine Learning and Regularization Methods for Inferring Evolutionary Fields and Couplings From a Multiple Sequence Alignment.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):328-342. doi: 10.1109/TCBB.2020.2993232. Epub 2022 Feb 3.

Abstract

The inverse Potts problem to infer a Boltzmann distribution for homologous protein sequences from their single-site and pairwise amino acid frequencies recently attracts a great deal of attention in the studies of protein structure and evolution. We study regularization and learning methods and how to tune regularization parameters to correctly infer interactions in Boltzmann machine learning. Using L regularization for fields, group L for couplings is shown to be very effective for sparse couplings in comparison with L and L. Two regularization parameters are tuned to yield equal values for both the sample and ensemble averages of evolutionary energy. Both averages smoothly change and converge, but their learning profiles are very different between learning methods. The Adam method is modified to make stepsize proportional to the gradient for sparse couplings and to use a soft-thresholding function for group L. It is shown by first inferring interactions from protein sequences and then from Monte Carlo samples that the fields and couplings can be well recovered, but that recovering the pairwise correlations in the resolution of a total energy is harder for the natural proteins than for the protein-like sequences. Selective temperature for folding/structural constrains in protein evolution is also estimated.

摘要

最近,从同源蛋白质序列的单点和成对氨基酸频率推断玻尔兹曼分布的逆 Potts 问题在蛋白质结构和进化的研究中引起了极大的关注。我们研究了正则化和学习方法,以及如何调整正则化参数以正确推断玻尔兹曼机器学习中的相互作用。使用 L 正则化场,与 L 和 L 相比,组 L 对耦合的正则化非常有效,适用于稀疏耦合。两个正则化参数被调整为使进化能量的样本平均值和集合平均值具有相等的值。两个平均值平滑地变化并收敛,但学习方法之间的学习曲线非常不同。对 Adam 方法进行了修改,使其步长与稀疏耦合的梯度成正比,并对组 L 使用软阈值函数。通过首先从蛋白质序列推断相互作用,然后从蒙特卡罗样本推断相互作用,表明可以很好地恢复场和耦合,但对于天然蛋白质,在解析总能量时恢复成对相关性比蛋白质样序列更难。还估计了蛋白质进化中折叠/结构约束的选择性温度。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验