Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland.
J Chem Theory Comput. 2022 Mar 8;18(3):1467-1479. doi: 10.1021/acs.jctc.1c00813. Epub 2022 Feb 18.
The application of machine learning to theoretical chemistry has made it possible to combine the accuracy of quantum chemical energetics with the thorough sampling of finite-temperature fluctuations. To reach this goal, a diverse set of methods has been proposed, ranging from simple linear models to kernel regression and highly nonlinear neural networks. Here we apply two widely different approaches to the same, challenging problem: the sampling of the conformational landscape of polypeptides at finite temperature. We develop a local kernel regression (LKR) coupled with a supervised sparsity method and compare it with a more established approach based on Behler-Parrinello type neural networks. In the context of the LKR, we discuss how the supervised selection of the reference pool of environments is crucial to achieve accurate potential energy surfaces at a competitive computational cost and leverage the locality of the model to infer which chemical environments are poorly described by the DFTB baseline. We then discuss the relative merits of the two frameworks and perform Hamiltonian-reservoir replica-exchange Monte Carlo sampling and metadynamics simulations, respectively, to demonstrate that both frameworks can achieve converged and transferable sampling of the conformational landscape of complex and flexible biomolecules with comparable accuracy and computational cost.
机器学习在理论化学中的应用使得将量子化学能学的准确性与有限温度波动的彻底采样相结合成为可能。为了达到这个目标,已经提出了各种各样的方法,从简单的线性模型到核回归和高度非线性神经网络。在这里,我们将两种截然不同的方法应用于同一个具有挑战性的问题:在有限温度下对多肽构象景观进行采样。我们开发了一种局部核回归 (LKR) 并与基于 Behler-Parrinello 型神经网络的更成熟方法进行了比较。在 LKR 的背景下,我们讨论了如何通过有监督的方法选择环境参考池来实现具有竞争力的计算成本下的准确势能面,并利用模型的局部性来推断哪些化学环境由 DFTB 基线描述得较差。然后,我们讨论了这两种框架的相对优点,并分别进行了哈密顿量-储库 replica-exchange 蒙特卡罗采样和元动力学模拟,以证明这两种框架都可以以可比的准确性和计算成本实现复杂和灵活生物分子构象景观的收敛和可转移采样。