Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50, 4056 Basel, Switzerland.
J Chem Inf Model. 2022 Apr 11;62(7):1602-1617. doi: 10.1021/acs.jcim.1c01438. Epub 2022 Mar 30.
Conformational sampling of protein structures is essential for understanding biochemical functions and for predicting thermodynamic properties such as free energies. Where previous approaches rely on sequential sampling procedures, recent developments in generative deep neural networks rendered possible the parallel, statistically independent sampling of molecular configurations. To be able to accurately generate samples of large molecular systems from a high-dimensional multimodal equilibrium distribution function, we developed a hierarchical approach based on expressive normalizing flows with rational quadratic neural splines and coarse-grained representation. Furthermore, system specific priors and adaptive and property-based controlled learning was designed to diminish the likelihood for the generation of high-energy structures during sampling. Finally, backmapping from a coarse-grained to fully atomistic representation is performed through an equivariant transformer model. We demonstrate the applicability of the method on the one-shot configurational sampling of a protein system with more than a hundred amino acids. The results show enhanced expressivity that diminish the invertibility constraints inherent in the normalizing flow framework. Moreover, the capacity of the hierarchical normalizing flow model was tested on a challenging case study of the folding/unfolding dynamics of the peptide chignolin.
蛋白质结构的构象采样对于理解生化功能和预测热力学性质(如自由能)至关重要。虽然之前的方法依赖于顺序采样过程,但最近生成式深度神经网络的发展使得分子构象可以并行、统计独立地进行采样。为了能够从高维多模态平衡分布函数中准确地生成大分子系统的样本,我们开发了一种基于表达能力强的正则化流的层次方法,其中包含有理二次神经样条和粗粒度表示。此外,还设计了系统特定的先验知识和基于属性的自适应控制学习,以减少采样过程中生成高能结构的可能性。最后,通过等变变压器模型将粗粒度表示映射回全原子表示。我们在一个具有一百多个氨基酸的蛋白质系统的单次构象采样中展示了该方法的适用性。结果表明,该方法的表达能力增强,降低了正则化流框架中固有的反演约束。此外,我们还对肽 chignolin 的折叠/展开动力学这一具有挑战性的案例研究测试了分层正则化流模型的能力。