John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138, United States.
J Chem Theory Comput. 2022 Apr 12;18(4):2341-2353. doi: 10.1021/acs.jctc.1c00143. Epub 2022 Mar 11.
Computing accurate reaction rates is a central challenge in computational chemistry and biology because of the high cost of free energy estimation with unbiased molecular dynamics. In this work, a data-driven machine learning algorithm is devised to learn collective variables with a multitask neural network, where a common upstream part reduces the high dimensionality of atomic configurations to a low dimensional latent space and separate downstream parts map the latent space to predictions of basin class labels and potential energies. The resulting latent space is shown to be an effective low-dimensional representation, capturing the reaction progress and guiding effective umbrella sampling to obtain accurate free energy landscapes. This approach is successfully applied to model systems including a 5D Müller Brown model, a 5D three-well model, the alanine dipeptide in vacuum, and an Au(110) surface reconstruction unit reaction. It enables automated dimensionality reduction for energy controlled reactions in complex systems, offers a unified and data-efficient framework that can be trained with limited data, and outperforms single-task learning approaches, including autoencoders.
计算准确的反应速率是计算化学和生物学中的一个核心挑战,因为无偏分子动力学的自由能估计成本很高。在这项工作中,设计了一种数据驱动的机器学习算法,通过多任务神经网络来学习协变量,其中一个共同的上游部分将原子构型的高维性降低到低维潜在空间,而分离的下游部分将潜在空间映射到盆地类标签和势能的预测。结果表明,潜在空间是一种有效的低维表示,它捕获了反应进度,并指导有效的伞形采样以获得准确的自由能景观。该方法成功应用于模型系统,包括 5D Müller Brown 模型、5D 三阱模型、真空状态下的丙氨酸二肽和 Au(110)表面重构单元反应。它能够实现复杂系统中能量控制反应的自动降维,提供了一个统一且高效的数据框架,可以用有限的数据进行训练,并优于单任务学习方法,包括自编码器。