Dept. of Materials Science and Engineering, 244 MSE, University of Wisconsin, Madison, 53562.
Mol Inform. 2020 Jun;39(6):e1900101. doi: 10.1002/minf.201900101. Epub 2020 Feb 20.
Flash points of organic molecules play an important role in preventing flammability hazards and large databases of measured values exist, although millions of compounds remain unmeasured. To rapidly extend existing data to new compounds many researchers have used quantitative structure-property relationship (QSPR) analysis to effectively predict flash points. In recent years graph-based deep learning (GBDL) has emerged as a powerful alternative method to traditional QSPR. In this paper, GBDL models were implemented in predicting flash point for the first time. We assessed the performance of two GBDL models, message-passing neural network (MPNN) and graph convolutional neural network (GCNN), by comparing against 12 previous QSPR studies using more traditional methods. Our result shows that MPNN both outperforms GCNN and yields slightly worse but comparable performance with previous QSPR studies. The average and Mean Absolute Error (MAE) scores of MPNN are, respectively, 2.3 % lower and 2.0 K higher than previous comparable studies. To further explore GBDL models, we collected the largest flash point dataset to date, which contains 10575 unique molecules. The optimized MPNN gives a test data of 0.803 and MAE of 17.8 K on the complete dataset. We also extracted 5 datasets from our integrated dataset based on molecular types (acids, organometallics, organogermaniums, organosilicons, and organotins) and explore the quality of the model in these classes.
有机分子的闪点在防止易燃性危险方面起着重要作用,尽管仍有数百万种化合物未被测量,但已经存在大量的实测值数据库。为了将现有数据快速扩展到新化合物,许多研究人员使用定量结构-性质关系(QSPR)分析来有效地预测闪点。近年来,基于图的深度学习(GBDL)已经成为传统 QSPR 的有力替代方法。在本文中,首次实现了 GBDL 模型来预测闪点。我们通过与使用传统方法的 12 项之前的 QSPR 研究进行比较,评估了两种 GBDL 模型,即消息传递神经网络(MPNN)和图卷积神经网络(GCNN)的性能。我们的结果表明,MPNN 不仅优于 GCNN,而且性能略差但与之前的 QSPR 研究相当。MPNN 的平均 和平均绝对误差(MAE)得分分别比之前的可比研究低 2.3%和高 2.0 K。为了进一步探索 GBDL 模型,我们收集了迄今为止最大的闪点数据集,其中包含 10575 种独特的分子。优化后的 MPNN 在完整数据集上的测试数据 的值为 0.803,MAE 为 17.8 K。我们还从我们的综合数据集中提取了 5 个基于分子类型(酸、有机金属、有机锗、有机硅和有机锡)的数据集,并探索了这些类别的模型质量。