Tata Institute of Fundamental Research, Center for Interdisciplinary Sciences, Hyderabad 500046, India.
J Chem Phys. 2021 Sep 21;155(11):114106. doi: 10.1063/5.0059965.
Biomacromolecules manifest dynamic conformational fluctuation and involve mutual interconversion among metastable states. A robust mapping of their conformational landscape often requires the low-dimensional projection of the conformational ensemble along optimized collective variables (CVs). However, the traditional choice for the CV is often limited by user-intuition and prior knowledge about the system, and this lacks a rigorous assessment of their optimality over other candidate CVs. To address this issue, we propose an approach in which we first choose the possible combinations of inter-residue Cα-distances within a given macromolecule as a set of input CVs. Subsequently, we derive a non-linear combination of latent space embedded CVs via auto-encoding the unbiased molecular dynamics simulation trajectories within the framework of the feed-forward neural network. We demonstrate the ability of the derived latent space variables in elucidating the conformational landscape in four hierarchically complex systems. The latent space CVs identify key metastable states of a bead-in-a-spring polymer. The combination of the adopted dimensional reduction technique with a Markov state model, built on the derived latent space, reveals multiple spatially and kinetically well-resolved metastable conformations for GB1 β-hairpin. A quantitative comparison based on the variational approach-based scoring of the auto-encoder-derived latent space CVs with the ones obtained via independent component analysis (principal component analysis or time-structured independent component analysis) confirms the optimality of the former. As a practical application, the auto-encoder-derived CVs were found to predict the reinforced folding of a Trp-cage mini-protein in aqueous osmolyte solution. Finally, the protocol was able to decipher the conformational heterogeneities involved in a complex metalloenzyme, namely, cytochrome P450.
生物大分子表现出动态的构象波动,并涉及亚稳态之间的相互转换。通常需要沿着优化的集体变量(CV)对构象系综进行低维投影,才能对其构象景观进行稳健映射。然而,传统的 CV 选择通常受到用户直觉和对系统的先验知识的限制,并且缺乏对其他候选 CV 进行的最优性的严格评估。为了解决这个问题,我们提出了一种方法,首先选择给定大分子内的残基 Cα-距离的可能组合作为一组输入 CV。随后,我们通过在前馈神经网络框架内对无偏分子动力学模拟轨迹进行自动编码,推导出潜在空间嵌入 CV 的非线性组合。我们在四个层次复杂的系统中展示了推导出的潜在空间变量阐明构象景观的能力。潜在空间 CV 确定了弹簧中珠子聚合物的关键亚稳态。采用降维技术与基于推导出的潜在空间构建的马尔可夫状态模型相结合,揭示了 GB1 β-发夹的多个空间和动力学上分辨率良好的亚稳态构象。基于变分方法的自动编码器推导的潜在空间 CV 的得分与通过独立成分分析(主成分分析或时间结构独立成分分析)获得的得分的定量比较证实了前者的最优性。作为实际应用,发现自动编码器推导的 CV 可以预测色氨酸笼 mini 蛋白在水渗透剂溶液中的增强折叠。最后,该方案能够解析涉及复杂金属酶即细胞色素 P450 的构象异质性。