Dipartimento di Scienze dell'Ambiente e del Territorio, Università degli Studi di Milano-Bicocca, Piazza della Scienza 1, 20126 Milano, Italy.
BMC Bioinformatics. 2011 May 14;12:158. doi: 10.1186/1471-2105-12-158.
Molecular dynamics (MD) simulations are powerful tools to investigate the conformational dynamics of proteins that is often a critical element of their function. Identification of functionally relevant conformations is generally done clustering the large ensemble of structures that are generated. Recently, Self-Organising Maps (SOMs) were reported performing more accurately and providing more consistent results than traditional clustering algorithms in various data mining problems. We present a novel strategy to analyse and compare conformational ensembles of protein domains using a two-level approach that combines SOMs and hierarchical clustering.
The conformational dynamics of the α-spectrin SH3 protein domain and six single mutants were analysed by MD simulations. The Cα's Cartesian coordinates of conformations sampled in the essential space were used as input data vectors for SOM training, then complete linkage clustering was performed on the SOM prototype vectors. A specific protocol to optimize a SOM for structural ensembles was proposed: the optimal SOM was selected by means of a Taguchi experimental design plan applied to different data sets, and the optimal sampling rate of the MD trajectory was selected. The proposed two-level approach was applied to single trajectories of the SH3 domain independently as well as to groups of them at the same time. The results demonstrated the potential of this approach in the analysis of large ensembles of molecular structures: the possibility of producing a topological mapping of the conformational space in a simple 2D visualisation, as well as of effectively highlighting differences in the conformational dynamics directly related to biological functions.
The use of a two-level approach combining SOMs and hierarchical clustering for conformational analysis of structural ensembles of proteins was proposed. It can easily be extended to other study cases and to conformational ensembles from other sources.
分子动力学 (MD) 模拟是研究蛋白质构象动力学的强大工具,而构象动力学通常是其功能的关键要素。通常通过对生成的大量结构集合进行聚类来识别与功能相关的构象。最近,自组织映射 (SOM) 在各种数据挖掘问题中被报道比传统聚类算法更准确,并提供更一致的结果。我们提出了一种新策略,通过使用 SOM 和层次聚类相结合的两级方法来分析和比较蛋白质结构域的构象集合。
通过 MD 模拟分析了α- spectrin SH3 蛋白结构域和六个单突变体的构象动力学。在必需空间中采样的构象的 Cα 笛卡尔坐标用作 SOM 训练的输入数据向量,然后对 SOM 原型向量执行完全链接聚类。提出了一种优化 SOM 用于结构集合的特定方案:通过应用于不同数据集的田口实验设计计划选择最佳 SOM,并选择 MD 轨迹的最佳采样率。所提出的两级方法既应用于 SH3 结构域的单个轨迹,也应用于它们的多个轨迹。结果表明了该方法在分析大型分子结构集合方面的潜力:在简单的 2D 可视化中生成构象空间拓扑映射的可能性,以及直接与生物学功能相关的构象动力学差异的有效突出。
提出了一种使用 SOM 和层次聚类相结合的两级方法来分析蛋白质结构集合的构象。它可以很容易地扩展到其他研究案例和来自其他来源的构象集合。