Qu Chen, Kearsley Anthony J, Schneider Barry I, Keyrouz Walid, Allison Thomas C
National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899, USA.
J Mol Graph Model. 2022 May;112:108149. doi: 10.1016/j.jmgm.2022.108149. Epub 2022 Feb 4.
In this article, we describe training and validation of a machine learning model for the prediction of organic compound normal boiling points. Data are drawn from the experimental literature as captured in the NIST Thermodynamics Research Center (TRC) SOURCE Data Archival System. The machine learning model is based on a graph neural network approach, a methodology that has proven powerful when applied to a variety of chemical problems. Model input is extracted from a 2D sketch of the molecule, making the methodology suitable for rapid prediction of normal boiling points in a wide variety of scenarios. Our final model predicts normal boiling points within 6 K (corresponding to a mean absolute percent error of 1.32%) with sample standard deviation less than 8 K. Additionally, we found that our model robustly identifies errors in the input data set during the model training phase, thereby further motivating the utility of systematic data exploration approaches for data-related efforts.
在本文中,我们描述了一种用于预测有机化合物正常沸点的机器学习模型的训练和验证。数据取自美国国家标准与技术研究院(NIST)热力学研究中心(TRC)SOURCE数据存档系统中收录的实验文献。该机器学习模型基于图神经网络方法,这种方法在应用于各种化学问题时已证明具有强大的功能。模型输入是从分子的二维草图中提取的,这使得该方法适用于在各种场景下快速预测正常沸点。我们的最终模型预测正常沸点的误差在6K以内(对应平均绝对百分比误差为1.32%),样本标准差小于8K。此外,我们发现我们的模型在模型训练阶段能够稳健地识别输入数据集中的错误,从而进一步凸显了系统数据探索方法在与数据相关工作中的实用性。