Monroe Jacob I, Shen Vincent K
Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, Maryland 20899-8320, USA.
J Chem Phys. 2022 Sep 7;157(9):094116. doi: 10.1063/5.0105120.
Variational autoencoders (VAEs) are rapidly gaining popularity within molecular simulation for discovering low-dimensional, or latent, representations, which are critical for both analyzing and accelerating simulations. However, it remains unclear how the information a VAE learns is connected to its probabilistic structure and, in turn, its loss function. Previous studies have focused on feature engineering, ad hoc modifications to loss functions, or adjustment of the prior to enforce desirable latent space properties. By applying effectively arbitrarily flexible priors via normalizing flows, we focus instead on how adjusting the structure of the decoding model impacts the learned latent coordinate. We systematically adjust the power and flexibility of the decoding distribution, observing that this has a significant impact on the structure of the latent space as measured by a suite of metrics developed in this work. By also varying weights on separate terms within each VAE loss function, we show that the level of detail encoded can be further tuned. This provides practical guidance for utilizing VAEs to extract varying resolutions of low-dimensional information from molecular dynamics and Monte Carlo simulations.
变分自编码器(VAEs)在分子模拟中迅速流行起来,用于发现低维或潜在表示,这对于分析和加速模拟都至关重要。然而,目前尚不清楚VAE学习到的信息是如何与其概率结构相关联的,进而也不清楚它与损失函数的关系。先前的研究主要集中在特征工程、对损失函数的临时修改,或调整先验以强制实现理想的潜在空间属性。通过归一化流有效地应用任意灵活的先验,我们转而关注调整解码模型的结构如何影响学习到的潜在坐标。我们系统地调整解码分布的幂次和灵活性,观察到这对潜在空间的结构有显著影响,这是通过本工作中开发的一组指标来衡量的。通过改变每个VAE损失函数中不同项的权重,我们表明可以进一步调整编码的细节水平。这为利用VAE从分子动力学和蒙特卡罗模拟中提取不同分辨率的低维信息提供了实用指导。