Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI), Department Bioinformatics, Schloss Birlinghoven, 53754, Sankt Augustin, Germany.
J Mol Model. 2013 Feb;19(2):539-49. doi: 10.1007/s00894-012-1563-4. Epub 2012 Sep 8.
With improvements in computer speed and algorithm efficiency, MD simulations are sampling larger amounts of molecular and biomolecular conformations. Being able to qualitatively and quantitatively sift these conformations into meaningful groups is a difficult and important task, especially when considering the structure-activity paradigm. Here we present a study that combines two popular techniques, principal component (PC) analysis and clustering, for revealing major conformational changes that occur in molecular dynamics (MD) simulations. Specifically, we explored how clustering different PC subspaces effects the resulting clusters versus clustering the complete trajectory data. As a case example, we used the trajectory data from an explicitly solvated simulation of a bacteria's L11·23S ribosomal subdomain, which is a target of thiopeptide antibiotics. Clustering was performed, using K-means and average-linkage algorithms, on data involving the first two to the first five PC subspace dimensions. For the average-linkage algorithm we found that data-point membership, cluster shape, and cluster size depended on the selected PC subspace data. In contrast, K-means provided very consistent results regardless of the selected subspace. Since we present results on a single model system, generalization concerning the clustering of different PC subspaces of other molecular systems is currently premature. However, our hope is that this study illustrates a) the complexities in selecting the appropriate clustering algorithm, b) the complexities in interpreting and validating their results, and c) by combining PC analysis with subsequent clustering valuable dynamic and conformational information can be obtained.
随着计算机速度和算法效率的提高,MD 模拟正在对更多的分子和生物分子构象进行采样。能够将这些构象定性和定量地筛选成有意义的组是一项困难且重要的任务,特别是在考虑结构-活性范式时。在这里,我们结合了两种流行的技术,主成分(PC)分析和聚类,用于揭示分子动力学(MD)模拟中发生的主要构象变化。具体来说,我们探讨了在不同的 PC 子空间上聚类如何影响最终的聚类,以及在整个轨迹数据上聚类的效果。作为一个案例研究,我们使用了细菌 L11·23S 核糖体亚基的明胶模拟的轨迹数据,该亚基是硫肽抗生素的靶点。使用 K-均值和平均链接算法对涉及前两个到前五个 PC 子空间维度的数据进行了聚类。对于平均链接算法,我们发现数据点的归属、聚类形状和聚类大小取决于所选的 PC 子空间数据。相比之下,K-均值无论选择哪个子空间都能提供非常一致的结果。由于我们仅在单个模型系统上呈现结果,因此目前对于不同分子系统的不同 PC 子空间的聚类进行推广还为时过早。然而,我们希望本研究说明了:a)选择适当聚类算法的复杂性;b)解释和验证其结果的复杂性;c)通过将 PC 分析与后续聚类相结合,可以获得有价值的动态和构象信息。