Cretu Miruna T, Pérez-Ríos Jesús
Department of Chemistry, Imperial College London, London SW7 2AZ, UK and Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, 14195 Berlin, Germany.
Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, 14195 Berlin, Germany.
Phys Chem Chem Phys. 2021 Feb 4;23(4):2891-2898. doi: 10.1039/d0cp05509c.
We show that by using intuitive and accessible molecular features it is possible to predict the temperature-dependent second virial coefficient of organic and inorganic compounds with Gaussian process regression. In particular, we built a low dimensional representation of features based on intrinsic molecular properties, topology and physical properties relevant for the characterization of molecule-molecule interactions. The featurization was used to predict second virial coefficients in the interpolative regime with a relative error ⪅1% and to extrapolate the prediction to temperatures outside of the training range for each compound in the dataset with a relative error of 2.1%. Additionally, the model's predictive abilities were extended to organic molecules unseen in the training process, yielding a prediction with a relative error of 2.7%. Test molecules must be well-represented in the training set by instances of their families, which are high in variety. The method shows a generally better performance when compared to several semi-empirical procedures employed in the prediction of the quantity. Therefore, apart from being robust, the present Gaussian process regression model is extensible to a variety of organic and inorganic compounds.
我们表明,通过使用直观且易于理解的分子特征,利用高斯过程回归可以预测有机和无机化合物的温度依赖性第二维里系数。具体而言,我们基于与分子间相互作用表征相关的内在分子性质、拓扑结构和物理性质构建了一个低维特征表示。该特征化用于在插值范围内预测第二维里系数,相对误差⪅1%,并将预测外推到数据集中每种化合物训练范围之外的温度,相对误差为2.1%。此外,该模型的预测能力扩展到了训练过程中未出现的有机分子,预测相对误差为2.7%。测试分子必须在训练集中通过其种类繁多的家族实例得到很好的体现。与用于预测该量的几种半经验方法相比,该方法总体表现更好。因此,除了具有鲁棒性外,当前的高斯过程回归模型还可扩展到各种有机和无机化合物。