Biomolecular Dynamics, Institute of Physics, Albert Ludwigs University, 79104 Freiburg, Germany.
J Chem Phys. 2017 Dec 28;147(24):244101. doi: 10.1063/1.4998259.
A dimensionality reduction method for high-dimensional circular data is developed, which is based on a principal component analysis (PCA) of data points on a torus. Adopting a geometrical view of PCA, various distance measures on a torus are introduced and the associated problem of projecting data onto the principal subspaces is discussed. The main idea is that the (periodicity-induced) projection error can be minimized by transforming the data such that the maximal gap of the sampling is shifted to the periodic boundary. In a second step, the covariance matrix and its eigendecomposition can be computed in a standard manner. Adopting molecular dynamics simulations of two well-established biomolecular systems (Aib and villin headpiece), the potential of the method to analyze the dynamics of backbone dihedral angles is demonstrated. The new approach allows for a robust and well-defined construction of metastable states and provides low-dimensional reaction coordinates that accurately describe the free energy landscape. Moreover, it offers a direct interpretation of covariances and principal components in terms of the angular variables. Apart from its application to PCA, the method of maximal gap shifting is general and can be applied to any other dimensionality reduction method for circular data.
我们提出了一种针对高维循环数据的降维方法,该方法基于环面上数据点的主成分分析(PCA)。采用 PCA 的几何观点,我们引入了各种环面上的距离度量,并讨论了将数据投影到主子空间的相关问题。主要思想是通过变换数据,将采样的最大间隙转移到周期边界,从而最小化(周期性引起的)投影误差。在第二步中,可以以标准方式计算协方差矩阵及其特征分解。通过对两个成熟的生物分子系统(Aib 和 villin 头部片段)的分子动力学模拟,证明了该方法分析主链二面角动力学的潜力。新方法允许稳健且明确定义的亚稳态构建,并提供能够准确描述自由能景观的低维反应坐标。此外,它还提供了一种直接解释协方差和主成分的方法,涉及角度变量。除了在 PCA 中的应用之外,最大间隙转移方法是通用的,可以应用于任何其他针对循环数据的降维方法。