Singh Samrendra K, Bejagam Karteek K, An Yaxin, Deshmukh Sanket A
CNH Industrial , Burr Ridge , Illinois 60527 , United States.
Department of Chemical Engineering , Virginia Tech , Blacksburg , Virginia 24061 , United States.
J Phys Chem A. 2019 Jun 20;123(24):5190-5198. doi: 10.1021/acs.jpca.9b03420. Epub 2019 Jun 11.
Accurate, faster, and on-the-fly analysis of the molecular dynamics (MD) simulations trajectory becomes very critical during the discovery of new materials or while developing force-field parameters due to automated nature of these processes. Here to overcome the drawbacks of algorithm based analysis approaches, we have developed and utilized an approach that integrates machine-learning (ML) based stacked ensemble model (SEM) with MD simulations, for the first time. As a proof-of-concept, two SEMs were developed to analyze two dynamical properties of a water droplet, its contact angle, and hydrogen bonds. The two SEMs consisted of two layered networks of random forest, artificial neural network, support vector regression, Kernel ridge regression, and k-nearest neighbors ML models. The root-mean-square error values, uncertainty quantification, and sensitivity analysis of both the SEMs suggested that the final result was more accurate as compared to that of the individual ML models. This new computational framework is very general, robust, and has a huge potential in analyzing large size MD simulation trajectories as it can capture critical information very accurately.
在新材料发现过程中或开发力场参数时,由于这些过程的自动化性质,对分子动力学(MD)模拟轨迹进行准确、快速且即时的分析变得至关重要。在此,为克服基于算法的分析方法的缺点,我们首次开发并利用了一种将基于机器学习(ML)的堆叠集成模型(SEM)与MD模拟相结合的方法。作为概念验证,开发了两个SEM来分析水滴的两个动力学特性,即其接触角和氢键。这两个SEM由随机森林、人工神经网络、支持向量回归、核岭回归和k近邻ML模型组成的两层网络构成。两个SEM的均方根误差值、不确定性量化和敏感性分析表明,与单个ML模型相比,最终结果更准确。这种新的计算框架非常通用、稳健,并且在分析大尺寸MD模拟轨迹方面具有巨大潜力,因为它能够非常准确地捕捉关键信息。