Chen Haochuan, Chipot Christophe
Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France.
Theoretical and Computational Biophysics Group, Beckman Institute, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
QRB Discov. 2023 Jan 6;4:e2. doi: 10.1017/qrd.2022.23. eCollection 2023.
The convergence of free-energy calculations based on importance sampling depends heavily on the choice of collective variables (CVs), which in principle, should include the slow degrees of freedom of the biological processes to be investigated. Autoencoders (AEs), as emerging data-driven dimension reduction tools, have been utilised for discovering CVs. AEs, however, are often treated as black boxes, and what AEs actually encode during training, and whether the latent variables from encoders are suitable as CVs for further free-energy calculations remains unknown. In this contribution, we review AEs and their time-series-based variants, including time-lagged AEs (TAEs) and modified TAEs, as well as the closely related model variational approach for Markov processes networks (VAMPnets). We then show through numerical examples that AEs learn the high-variance modes instead of the slow modes. In stark contrast, time series-based models are able to capture the slow modes. Moreover, both modified TAEs with extensions from slow feature analysis and the state-free reversible VAMPnets (SRVs) can yield orthogonal multidimensional CVs. As an illustration, we employ SRVs to discover the CVs of the isomerizations of -acetyl-'-methylalanylamide and trialanine by iterative learning with trajectories from biased simulations. Last, through numerical experiments with anisotropic diffusion, we investigate the potential relationship of time-series-based models and committor probabilities.
基于重要性采样的自由能计算的收敛性在很大程度上取决于集体变量(CVs)的选择,原则上,集体变量应包括待研究生物过程的慢自由度。自动编码器(AEs)作为新兴的数据驱动降维工具,已被用于发现集体变量。然而,自动编码器通常被视为黑箱,自动编码器在训练期间实际编码的内容,以及编码器的潜在变量是否适合作为进一步自由能计算的集体变量仍然未知。在本论文中,我们回顾了自动编码器及其基于时间序列的变体,包括时间滞后自动编码器(TAEs)和改进的时间滞后自动编码器,以及与马尔可夫过程网络密切相关的模型变分方法(VAMPnets)。然后,我们通过数值示例表明,自动编码器学习的是高方差模式而不是慢模式。与之形成鲜明对比的是,基于时间序列的模型能够捕捉慢模式。此外,结合了慢特征分析扩展的改进时间滞后自动编码器和无状态可逆VAMPnets(SRVs)都可以生成正交多维集体变量。作为示例,我们使用SRVs通过对有偏模拟轨迹的迭代学习来发现乙酰基 - 甲基丙氨酰胺和三丙氨酸异构化的集体变量。最后,通过各向异性扩散的数值实验,我们研究了基于时间序列的模型与反应坐标概率之间的潜在关系。