Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland.
J Chem Phys. 2021 Feb 28;154(8):084106. doi: 10.1063/5.0025797.
The combination of Markov state modeling (MSM) and molecular dynamics (MD) simulations has been shown in recent years to be a valuable approach to unravel the slow processes of molecular systems with increasing complexity. While the algorithms for intermediate steps in the MSM workflow such as featurization and dimensionality reduction have been specifically adapted to MD datasets, conventional clustering methods are generally applied to the discretization step. This work adds to recent efforts to develop specialized density-based clustering algorithms for the Boltzmann-weighted data from MD simulations. We introduce the volume-scaled common nearest neighbor (vs-CNN) clustering that is an adapted version of the common nearest neighbor (CNN) algorithm. A major advantage of the proposed algorithm is that the introduced density-based criterion directly links to a free-energy notion via Boltzmann inversion. Such a free-energy perspective allows a straightforward hierarchical scheme to identify conformational clusters at different levels of a generally rugged free-energy landscape of complex molecular systems.
近年来,马尔可夫状态建模(MSM)和分子动力学(MD)模拟的组合已被证明是一种有价值的方法,可以揭示具有日益复杂性的分子系统的缓慢过程。虽然 MSM 工作流程中的中间步骤(如特征化和降维)的算法已经专门针对 MD 数据集进行了适配,但传统的聚类方法通常应用于离散化步骤。这项工作是对最近开发专门用于 MD 模拟的 Boltzmann 加权数据的基于密度的聚类算法的努力的补充。我们引入了基于体积的常见最近邻(vs-CNN)聚类,这是常见最近邻(CNN)算法的一个改编版本。该算法的一个主要优点是,引入的基于密度的标准通过 Boltzmann 反演直接与自由能概念相关联。这种自由能观点允许使用直接的层次方案在复杂分子系统的一般崎岖自由能景观的不同水平上识别构象簇。