Austermeier Lea E, Voigt Karsten, Böhme Alexander, Ulrich Nadin
Department of Exposure Science, Helmholtz Centre for Environmental ResearchUFZ, Permoserstrasse 15, Leipzig D-04318, Germany.
PAULY, Theresienstrasse 50, Leipzig D-04129, Germany.
ACS Omega. 2025 Jun 3;10(23):24296-24306. doi: 10.1021/acsomega.5c00205. eCollection 2025 Jun 17.
The melting point (MP) of a chemical is an important physicochemical property that characterizes the transition from a solid to a liquid state. The MP is a key parameter in molecular design and relevant in many fields such as drug design and environmental science. Therefore, an accurate prediction of the MP is of huge interest. Here, we develop two graph convolutional neural network (GNN) models for the prediction of the MP: one where we do not apply a data augmentation strategy and one where we apply a data augmentation strategy. The models were developed on a data set containing 28,645 chemicals, where we removed duplicates and data points labeled as faulty. Then we split the data set into training, validation, and test sets. The model was trained on this initial data set and on a higher curated data set. Based on the data augmentation, we could enlarge the number of neurons in each of the two hidden layers in the GNN and reinforce the representation of large and complex molecules. We compared the influence of the curation step and the data augmentation and found that the curation step had no significant influence on the model performance, while the model could be improved by the application of data augmentation. With a consensus model, we achieved an rmse of 35.4 °C.
化学品的熔点(MP)是一项重要的物理化学性质,它表征了从固态到液态的转变。熔点是分子设计中的关键参数,在药物设计和环境科学等许多领域都具有相关性。因此,准确预测熔点具有极大的研究价值。在此,我们开发了两种用于预测熔点的图卷积神经网络(GNN)模型:一种未应用数据增强策略,另一种应用了数据增强策略。这些模型是基于一个包含28645种化学品的数据集开发的,我们去除了重复数据点以及标记为有缺陷的数据点。然后,我们将数据集划分为训练集、验证集和测试集。该模型在这个初始数据集以及一个经过更高质量筛选的数据集上进行训练。基于数据增强,我们能够增加GNN中两个隐藏层各自的神经元数量,并强化大型和复杂分子的表示。我们比较了数据筛选步骤和数据增强的影响,发现数据筛选步骤对模型性能没有显著影响,而应用数据增强可以提升模型性能。通过一个共识模型,我们实现了35.4°C的均方根误差(RMSE)。