Adhikari Subinoy, Mondal Jagannath
Tata Institute of Fundamental Research, Hyderabad 500046, India.
J Chem Theory Comput. 2025 Jul 8;21(13):6367-6379. doi: 10.1021/acs.jctc.5c00365. Epub 2025 Jun 16.
Proteins traverse intricate conformational landscapes with transitions and long-lived states that hold the key to their biological function. However, unraveling these dynamics remains a formidable challenge. An emerging approach has been to train the conformational ensemble via deep Variational autoencoders (VAEs) in a bid to machine learn the underlying reduced-dimensional representation. However, training VAEs typically involves a fixed β value of 1, where β acts as the crucial weighing factor between the reconstruction and regularization terms. This static setup can often lead to posterior collapse, which significantly hinders the model's ability to capture complex protein dynamics accurately. To mitigate this issue, annealing the β parameter offers a potential alternative. However, this approach frequently falls short in fully addressing the problem, mainly due to the arbitrary choice of the upper bound of β and the annealing schedule. In this work, we propose a new approach for selecting the β parameter by utilizing the Fraction of variation explained (FVE) score to identify its optimal value. We demonstrate that training annealed VAEs at their optimum β in a single cycle consistently outperformed their nonannealed counterparts, as evident from their higher variational approach for Markov processes-2 and generalized matrix Rayleigh quotient scores and distinct free energy surface minima on both folded and intrinsically disordered proteins. The improved latent space representations significantly improve state space discretization, thereby refining Markov State Models and providing more accurate insights into conformational landscapes, as reflected in distinct contact maps. Together, this development provides a systematic approach to optimizing the balance between reconstruction and regularization aspects of VAEs that would augment the potential of annealed VAEs in resolving complex conformational landscapes.
蛋白质穿越复杂的构象景观,其转变和长寿命状态是其生物学功能的关键。然而,揭示这些动力学仍然是一项艰巨的挑战。一种新兴的方法是通过深度变分自编码器(VAE)训练构象集合,以期机器学习潜在的降维表示。然而,训练VAE通常涉及固定的β值为1,其中β作为重建项和正则化项之间的关键权衡因子。这种静态设置往往会导致后验坍缩,这显著阻碍了模型准确捕捉复杂蛋白质动力学的能力。为了缓解这个问题,对β参数进行退火提供了一种潜在的替代方法。然而,这种方法往往无法完全解决问题,主要是由于β上限和退火时间表的任意选择。在这项工作中,我们提出了一种通过利用解释变异分数(FVE)分数来选择β参数以确定其最优值的新方法。我们证明,在单个周期内以最优β训练退火VAE始终优于未退火的VAE,从它们更高的马尔可夫过程-2变分方法和广义矩阵瑞利商分数以及折叠和内在无序蛋白质上不同的自由能表面最小值可以明显看出。改进的潜在空间表示显著改善了状态空间离散化,从而完善了马尔可夫状态模型,并提供了对构象景观更准确的见解,这在不同的接触图中得到了体现。总之,这一进展提供了一种系统的方法来优化VAE重建和正则化方面之间的平衡,这将增强退火VAE在解决复杂构象景观方面的潜力。