Duan Mojie, Fan Jue, Li Minghai, Han Li, Huo Shuanghong
Gustaf H. Carlson School of Chemistry and Biochemistry, Clark University, Worcester, MA 01610 USA.
J Chem Theory Comput. 2013 May 14;9(5):2490-2497. doi: 10.1021/ct400052y.
Dimensionality reduction methods have been widely used to study the free energy landscapes and low-free energy pathways of molecular systems. It was shown that the non-linear dimensionality-reduction methods gave better embedding results than the linear methods, such as principal component analysis, in some simple systems. In this study, we have evaluated several non linear methods, locally linear embedding, Isomap, and diffusion maps, as well as principal component analysis from the equilibrium folding/unfolding trajectory of the second β-hairpin of the B1 domain of streptococcal protein G. The CHARMM parm19 polar hydrogen potential function was used. A series of criteria which reflects different aspects of the embedding qualities were employed in the evaluation. Our results show that principal component analysis is not worse than the non-linear ones on this complex system. There is no clear winner in all aspects of the evaluation. Each dimensionality-reduction method has its limitations in a certain aspect. We emphasize that a fair, informative assessment of an embedding result requires a combination of multiple evaluation criteria rather than any single one. Caution should be used when dimensionality-reduction methods are employed, especially when only a few of top embedding dimensions are used to describe the free energy landscape.
降维方法已被广泛用于研究分子系统的自由能景观和低自由能路径。研究表明,在一些简单系统中,非线性降维方法比线性方法(如主成分分析)能给出更好的嵌入结果。在本研究中,我们从链球菌蛋白G的B1结构域第二个β-发夹的平衡折叠/去折叠轨迹评估了几种非线性方法,局部线性嵌入、等距映射和扩散映射,以及主成分分析。使用了CHARMM parm19极性氢势函数。评估中采用了一系列反映嵌入质量不同方面的标准。我们的结果表明,在这个复杂系统上主成分分析并不比非线性方法差。在评估的所有方面没有明显的优胜者。每种降维方法在某一方面都有其局限性。我们强调,对嵌入结果进行公平、信息丰富的评估需要多种评估标准的结合,而不是任何单一标准。在使用降维方法时应谨慎,特别是当仅使用少数几个顶级嵌入维度来描述自由能景观时。