Department of Mathematics and Computer Science, Computational Molecular Biology Group, Arnimallee 6, 14195 Berlin, Germany.
Bioinformatics. 2019 May 1;35(9):1588-1590. doi: 10.1093/bioinformatics/bty818.
Many modeling analyses of molecular dynamics (MD) simulations are based on a definition of states that can be (groups of) clusters of simulation frames in a feature space composed of molecular coordinates. With increasing dimension of this feature space (due to the increasing size or complexity of a simulated molecule), it becomes very difficult to cluster the underlying MD data and estimate a statistically robust model. To mitigate this "curse of dimensionality", one can reduce the feature space, e.g., with principal component or time-lagged independent component analysis transformations, focusing the analysis on the most important modes of transitions. In practice, however, all these reduction strategies may neglect important molecular details that are susceptible to experimental verification.
To recover such molecular details, I have developed PySFD (Significant Feature Differences analyzer for Python), a multi-processing software package that efficiently selects significantly different features of any user-defined feature type among potentially many different simulated state ensembles, such as meta-stable states of a Markov State Model (MSM). Applying PySFD on MSMs of an aggregate of 300 microseconds MD simulations recently performed on the major histocompatibility complex class II (MHCII) protein, I demonstrate how this toolkit can extract and visualize valuable mechanistic information from big MD simulation data, e.g., in form of networks of dynamic interaction changes connecting functionally relevant sites of a protein complex.
PySFD is freely available under the L-GPL license at https://github.com/markovmodel/PySFD.
Supplementary data are available at Bioinformatics online.
许多分子动力学 (MD) 模拟的建模分析都是基于状态的定义,这些状态可以是分子坐标构成的特征空间中的模拟帧 (帧群)。随着特征空间维度的增加(由于模拟分子的尺寸或复杂性增加),对基础 MD 数据进行聚类和估计具有稳健统计模型变得非常困难。为了缓解这种“维度诅咒”,可以减少特征空间,例如通过主成分或时滞独立成分分析变换,将分析重点放在最重要的转变模式上。然而,在实践中,所有这些降维策略都可能忽略了易于实验验证的重要分子细节。
为了恢复这些分子细节,我开发了 PySFD(用于 Python 的显著特征差异分析器),这是一个多处理软件包,可以有效地选择任何用户定义的特征类型之间的潜在许多不同模拟状态集合中的显著不同的特征,例如马尔可夫状态模型 (MSM) 的亚稳态。在最近对主要组织相容性复合物 II (MHCII) 蛋白进行的 300 微秒 MD 模拟的 MSM 上应用 PySFD,我展示了这个工具包如何从大型 MD 模拟数据中提取和可视化有价值的机制信息,例如以连接蛋白质复合物功能相关位点的动态相互作用变化网络的形式。
PySFD 可在 L-GPL 许可证下免费获得,网址为 https://github.com/markovmodel/PySFD。
补充数据可在 Bioinformatics 在线获得。