Suppr超能文献

结合机器学习与化学图论预测有机化合物的标准燃烧焓:一种策略

Prediction of Standard Combustion Enthalpy of Organic Compounds Combining Machine Learning and Chemical Graph Theory: A Strategy.

作者信息

Saviñon-Flores Fernanda, Arzola-Flores Jesús A, García-Castro Miguel A, Díaz-Sánchez Fausto, Vidal Robles Esmeralda, Maruri Valderrabano Fidel Aaron

机构信息

Facultad de Ingeniería Química de la Benemérita Universidad Autónoma de Puebla, 18 Sur y Avenue San Claudio, C.P., Puebla, Pue 72570, México.

出版信息

ACS Omega. 2025 Sep 8;10(36):41828-41848. doi: 10.1021/acsomega.5c05927. eCollection 2025 Sep 16.

Abstract

The prediction of thermochemical properties such as the standard enthalpy of combustion is essential for the design and evaluation of energetic materials. In this study, the prediction of this thermochemical property is proposed through a QSPR strategy that combines machine learning and chemical graph theory. The data set consisted of 3477 organic compounds. SMILES codes were used for each molecule to construct their molecular graphs, from which topological indices such as Estrada, Wiener, and Gutman, as well as centrality measures, were calculated. These descriptors served as predictors in supervised learning models, with tree-based ensemble models showing the best performance. The best-performing model, random forest, achieved the following metrics on the test set: = 0.9810, = 287.5988 kJ·mol, = 0.1048, = 551.9050 kJ·mol, and = 0.1933. Interpretability analysis using SHAP confirmed that the Estrada and Gutman indices were the most influential variables in the predictions. In addition, the same random forest model was trained using 210 molecular descriptors obtained from RDKit, yielding slightly better metrics: = 0.9927, = 142.2272 kJ·mol, = 0.0484, and = 342.0464 kJ·mol, and = 0.1172. Moreover, specific models were developed for different families of compounds, achieving ≈ 0.99 in all cases. Finally, a clustering analysis using the K-Means algorithm in the space defined by the topological indices enabled the identification of latent molecular patterns, providing a novel framework for organizing and analyzing chemical space. This work demonstrates the potential of combining supervised and unsupervised learning methods with chemical graph theory to enable accurate, robust, and scalable prediction of thermochemical properties such as combustion enthalpy.

摘要

预测热化学性质(如标准燃烧焓)对于含能材料的设计和评估至关重要。在本研究中,通过结合机器学习和化学图论的定量构效关系(QSPR)策略对这种热化学性质进行预测。数据集由3477种有机化合物组成。使用SMILES编码为每个分子构建其分子图,并计算诸如Estrada、Wiener和Gutman等拓扑指数以及中心性度量。这些描述符在监督学习模型中用作预测变量,基于树的集成模型表现最佳。性能最佳的随机森林模型在测试集上实现了以下指标: = 0.9810, = 287.5988 kJ·mol, = 0.1048, = 551.9050 kJ·mol,以及 = 0.1933。使用SHAP进行的可解释性分析证实,Estrada和Gutman指数是预测中最具影响力的变量。此外,使用从RDKit获得的210个分子描述符训练了相同的随机森林模型,得到了略好的指标: = 0.9927, = 142.2272 kJ·mol, = 0.0484,以及 = 342.0464 kJ·mol, = 0.1172。此外,针对不同化合物家族开发了特定模型,在所有情况下均实现了 ≈ 0.99。最后,在由拓扑指数定义的空间中使用K-Means算法进行聚类分析,能够识别潜在的分子模式,为组织和分析化学空间提供了一个新框架。这项工作展示了将监督学习和无监督学习方法与化学图论相结合,以实现对燃烧焓等热化学性质进行准确、稳健且可扩展预测的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9bfe/12444530/03aca470ab8e/ao5c05927_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验