Suppr超能文献

利用分子拓扑预测含能化合物的密度。

Density Prediction Models for Energetic Compounds Merely Using Molecular Topology.

机构信息

School of Computer Science and Technology, Southwest University of Science & Technology, Mianyang 621010, Sichuan, China.

Institute of Chemical Materials, China Academy of Engineering Physics (CAEP), P.O. Box 919-311, Mianyang 621999, Sichuan, China.

出版信息

J Chem Inf Model. 2021 Jun 28;61(6):2582-2593. doi: 10.1021/acs.jcim.0c01393. Epub 2021 Apr 12.

Abstract

Newly developed high-throughput methods for property predictions make the process of materials design faster and more efficient. Density is an important physical property for energetic compounds to assess detonation velocity and detonation pressure, but the time cost of recent density prediction models is still high owing to the time-consuming processes to calculate molecular descriptors. To improve the screening efficiency of potential energetic compounds, new methods for density prediction with more accuracy and less time cost are urgently needed, and a possible solution is to establish direct mappings between the molecular structure and density. We propose three machine learning (ML) models, support vector machine (SVM), random forest (RF), and Graph neural network (GNN), using molecular topology as the only known input. The widely applied quantitative structure-property relationship based on the density functional theory (DFT-QSPR) is adopted as the benchmark to evaluate the accuracies of the models. All these four models are trained and tested by using the same data set enclosing over 2000 reported nitro compounds searched out from the Cambridge Structural Database. The proportions of compounds with prediction error less than 5% are evaluated by using the independent test set, and the values for the models of SVM, RF, DFT-QSPR, and GNN are 48, 63, 85, and 88%, respectively. The results show that, for the models of SVM and RF, fingerprint bit vectors alone are not facilitated to obtain good QSPRs. Mapping between the molecular structure and density can be well established by using GNN and molecular topology, and its accuracy is slightly better than that of the time-consuming DFT-QSPR method. The GNN-based model has higher accuracy and lower computational resource cost than the widely accepted DFT-QSPR model, so it is more suitable for high-throughput screening of energetic compounds.

摘要

新开发的高通量物性预测方法使材料设计过程更快、更高效。密度是评估爆轰速度和爆轰压力的含能化合物的重要物理性质,但由于计算分子描述符的耗时过程,最近的密度预测模型的时间成本仍然很高。为了提高潜在含能化合物的筛选效率,迫切需要具有更高精度和更低时间成本的密度预测新方法,一种可能的解决方案是建立分子结构和密度之间的直接映射。我们提出了三种机器学习 (ML) 模型,支持向量机 (SVM)、随机森林 (RF) 和图神经网络 (GNN),仅使用分子拓扑作为唯一已知输入。广泛应用的基于密度泛函理论的定量结构-性质关系 (DFT-QSPR) 被用作基准来评估模型的准确性。所有这四个模型都是通过使用包含从剑桥结构数据库中搜索出的 2000 多个报道的硝基化合物的相同数据集进行训练和测试的。通过使用独立测试集评估预测误差小于 5%的化合物的比例,SVM、RF、DFT-QSPR 和 GNN 模型的值分别为 48%、63%、85%和 88%。结果表明,对于 SVM 和 RF 模型,单独使用指纹位向量不利于获得良好的 QSPR。通过使用 GNN 和分子拓扑可以很好地建立分子结构和密度之间的映射,其准确性略优于耗时的 DFT-QSPR 方法。基于 GNN 的模型比广泛接受的 DFT-QSPR 模型具有更高的准确性和更低的计算资源成本,因此更适合含能化合物的高通量筛选。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验