Tian Hao, Tao Peng
Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas 75205, United States.
J Chem Inf Model. 2020 Oct 26;60(10):4569-4581. doi: 10.1021/acs.jcim.0c00485. Epub 2020 Sep 1.
Molecular dynamics (MD) simulations have been widely applied to study macromolecules including proteins. However, the high dimensionality of the data sets produced by simulations makes thorough analysis difficult and further hinders a deeper understanding of biomacromolecules. To gain more insights into the protein structure-function relations, appropriate dimensionality reduction methods are needed to project simulations onto low-dimensional spaces. Linear dimensionality reduction methods, such as principal component analysis (PCA) and time-structure-based independent component analysis (t-ICA), could not preserve sufficient structural information. Though better than linear methods, nonlinear methods, such as t-distributed stochastic neighbor embedding (t-SNE), still suffer from the limitations in avoiding system noise and keeping inter-cluster relations. ivis is a novel deep learning-based dimensionality reduction method originally developed for single-cell data sets. Here, we applied this framework for the study of light, oxygen, and voltage (LOV) domains of diatom aureochrome 1a (PtAu1a). Compared with other methods, ivis is shown to be superior in constructing a Markov state model (MSM), preserving information of both local and global distances, and maintaining similarity between high and low dimensions with the least information loss. Moreover, the ivis framework is capable of providing new perspectives for deciphering residue-level protein allostery through the feature weights in the neural network. Overall, ivis is a promising member of the analysis toolbox for proteins.
分子动力学(MD)模拟已被广泛应用于研究包括蛋白质在内的大分子。然而,模拟产生的数据集的高维度使得全面分析变得困难,进一步阻碍了对生物大分子的深入理解。为了更深入地了解蛋白质的结构-功能关系,需要合适的降维方法将模拟投影到低维空间。线性降维方法,如主成分分析(PCA)和基于时间结构的独立成分分析(t-ICA),无法保留足够的结构信息。尽管非线性方法,如t分布随机邻域嵌入(t-SNE)比线性方法更好,但在避免系统噪声和保持簇间关系方面仍存在局限性。ivis是一种最初为单细胞数据集开发的基于深度学习的新型降维方法。在这里,我们将这个框架应用于硅藻金藻1a(PtAu1a)的光、氧和电压(LOV)结构域的研究。与其他方法相比,ivis在构建马尔可夫状态模型(MSM)、保留局部和全局距离信息以及以最小的信息损失保持高维和低维之间的相似性方面表现出色。此外,ivis框架能够通过神经网络中的特征权重为解读残基水平的蛋白质变构提供新的视角。总体而言,ivis是蛋白质分析工具箱中一个很有前途的成员。