Jin Jun-Xuan, Ren Gao-Peng, Hu Jianjian, Liu Yingzhe, Gao Yunhu, Wu Ke-Jun, He Yuchen
Zhejiang Provincial Key Laboratory of Advanced Chemical Engineering Manufacture Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, China.
Institute of Zhejiang University-Quzhou, Quzhou, 324000, China.
J Cheminform. 2023 Jul 19;15(1):65. doi: 10.1186/s13321-023-00736-6.
Machine learning has great potential in predicting chemical information with greater precision than traditional methods. Graph neural networks (GNNs) have become increasingly popular in recent years, as they can automatically learn the features of the molecule from the graph, significantly reducing the time needed to find and build molecular descriptors. However, the application of machine learning to energetic materials property prediction is still in the initial stage due to insufficient data. In this work, we first curated a dataset of 12,072 compounds containing CHON elements, which are traditionally regarded as main composition elements of energetic materials, from the Cambridge Structural Database, then we implemented a refinement to our force field-inspired neural network (FFiNet), through the adoption of a Transformer encoder, resulting in force field-inspired Transformer network (FFiTrNet). After the improvement, our model outperforms other machine learning-based and GNNs-based models and shows its powerful predictive capabilities especially for high-density materials. Our model also shows its capability in predicting the crystal density of potential energetic materials dataset (i.e. Huang & Massa dataset), which will be helpful in practical high-throughput screening of energetic materials.
机器学习在预测化学信息方面具有巨大潜力,其精度高于传统方法。近年来,图神经网络(GNN)越来越受欢迎,因为它们可以从图中自动学习分子的特征,显著减少寻找和构建分子描述符所需的时间。然而,由于数据不足,机器学习在含能材料性能预测方面的应用仍处于初始阶段。在这项工作中,我们首先从剑桥结构数据库中整理了一个包含CHON元素的12072种化合物的数据集,这些元素传统上被视为含能材料的主要组成元素,然后我们通过采用Transformer编码器对我们的力场启发神经网络(FFiNet)进行了改进,得到了力场启发Transformer网络(FFiTrNet)。改进后,我们的模型优于其他基于机器学习和基于GNN的模型,并显示出其强大的预测能力,特别是对于高密度材料。我们的模型还展示了其预测潜在含能材料数据集(即Huang & Massa数据集)晶体密度的能力,这将有助于含能材料的实际高通量筛选。