Donkor Edward Danquah, Laio Alessandro, Hassanali Ali
The Abdus Salam International Center for Theoretical Physics (ICTP), Strada Costiera 11, 34151 Trieste, Italy.
Scuola Internazionale Superiore di Studi Avanzati (SISSA), via Bonomea 265, 34136 Trieste, Italy.
J Chem Theory Comput. 2023 Jul 25;19(14):4596-4605. doi: 10.1021/acs.jctc.2c01205. Epub 2023 Mar 15.
Machine-learning (ML) has become a key workhorse in molecular simulations. Building an ML model in this context involves encoding the information on chemical environments using local atomic descriptors. In this work, we focus on the Smooth Overlap of Atomic Positions (SOAP) and their application in studying the properties of liquid water both in the bulk and at the hydrophobic air-water interface. By using a statistical test aimed at assessing the relative information content of different distance measures defined on the same data space, we investigate if these descriptors provide the same information as some of the common order parameters that are used to characterize local water structure such as hydrogen bonding, density, or tetrahedrality to name a few. Our analysis suggests that the ML description and the standard order parameters of the local water structure are not equivalent. In particular, a combination of these order parameters probing local water environments can predict SOAP similarity only approximately, and vice versa, the environments that are similar according to SOAP are not necessarily similar according to the standard order parameters. We also elucidate the role of some of the metaparameters in the SOAP definition in encoding chemical information.
机器学习(ML)已成为分子模拟中的关键工具。在此背景下构建ML模型涉及使用局部原子描述符对化学环境信息进行编码。在这项工作中,我们专注于原子位置的平滑重叠(SOAP)及其在研究 bulk 态和疏水空气 - 水界面处液态水性质方面的应用。通过使用旨在评估在同一数据空间上定义的不同距离度量的相对信息含量的统计测试,我们研究这些描述符是否提供与一些用于表征局部水结构的常见序参量相同的信息,例如氢键、密度或四面体性等。我们的分析表明,局部水结构的ML描述和标准序参量并不等效。特别是,这些探测局部水环境的序参量的组合只能近似地预测SOAP相似性,反之亦然,根据SOAP相似的环境根据标准序参量不一定相似。我们还阐明了SOAP定义中的一些元参数在编码化学信息方面的作用。