• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 QM9 量子化学数据集的 SMILES 表示的机器学习对 9 种分子性质的预测。

Machine Learning Prediction of Nine Molecular Properties Based on the SMILES Representation of the QM9 Quantum-Chemistry Dataset.

机构信息

Associate Laboratory for Computing and Applied Mathematics, National Institute for Space Research, PO BOX 515, 12227-010, São José dos Campos, SP, Brazil.

São Carlos Institute of Chemistry, University of São Paulo, PO Box 780, 13560-970, São Carlos, SP, Brazil.

出版信息

J Phys Chem A. 2020 Nov 25;124(47):9854-9866. doi: 10.1021/acs.jpca.0c05969. Epub 2020 Nov 11.

DOI:10.1021/acs.jpca.0c05969
PMID:33174750
Abstract

Machine learning (ML) models can potentially accelerate the discovery of tailored materials by learning a function that maps chemical compounds into their respective target properties. In this realm, a crucial step is encoding the molecular systems into the ML model, in which the molecular representation plays a crucial role. Most of the representations are based on the use of atomic coordinates (structure); however, it can increase ML training and predictions' computational cost. Herein, we investigate the impact of choosing free-coordinate descriptors based on the Simplified Molecular Input Line Entry System (SMILES) representation, which can substantially reduce the ML predictions' computational cost. Therefore, we evaluate a feed-forward neural network (FNN) model's prediction performance over five feature selection methods and nine ground-state properties (including energetic, electronic, and thermodynamic properties) from a public data set composed of ∼130k organic molecules. Our best results reached a mean absolute error, close to chemical accuracy, of ∼0.05 eV for the atomization energies (internal energy at 0 K, internal energy at 298.15 K, enthalpy at 298.15 K, and free energy at 298.15 K). Moreover, for the atomization energies, the results obtained an out-of-sample error nine times less than the same FNN model trained with the Coulomb matrix, a traditional coordinate-based descriptor. Furthermore, our results showed how limited the model's accuracy is by employing such low computational cost representation that carries less information about the molecular structure than the most state-of-the-art methods.

摘要

机器学习 (ML) 模型可以通过学习将化学化合物映射到各自目标性质的函数,从而加速定制材料的发现。在这个领域中,一个关键步骤是将分子系统编码到 ML 模型中,其中分子表示起着至关重要的作用。大多数表示方法都是基于原子坐标(结构)的使用;然而,这会增加 ML 训练和预测的计算成本。在此,我们研究了选择基于简化分子输入行输入系统 (SMILES) 表示的自由坐标描述符的影响,这可以大大降低 ML 预测的计算成本。因此,我们评估了前馈神经网络 (FNN) 模型在五个特征选择方法和九个基态性质(包括能量、电子和热力学性质)上的预测性能,这些性质来自一个由约 130k 个有机分子组成的公共数据集。我们的最佳结果达到了原子化能(0 K 时的内能、298.15 K 时的内能、298.15 K 时的焓和 298.15 K 时的自由能)的平均绝对误差,接近化学精度,约为 0.05 eV。此外,对于原子化能,与使用传统坐标描述符库仑矩阵训练的相同 FNN 模型相比,所得结果的外推误差要小九倍。此外,我们的结果表明,使用这种计算成本较低的表示方法,模型的准确性受到限制,因为它携带的关于分子结构的信息比最先进的方法要少。

相似文献

1
Machine Learning Prediction of Nine Molecular Properties Based on the SMILES Representation of the QM9 Quantum-Chemistry Dataset.基于 QM9 量子化学数据集的 SMILES 表示的机器学习对 9 种分子性质的预测。
J Phys Chem A. 2020 Nov 25;124(47):9854-9866. doi: 10.1021/acs.jpca.0c05969. Epub 2020 Nov 11.
2
Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error.分子机器学习模型的预测误差低于混合密度泛函理论误差。
J Chem Theory Comput. 2017 Nov 14;13(11):5255-5264. doi: 10.1021/acs.jctc.7b00577. Epub 2017 Oct 10.
3
Chemical diversity in molecular orbital energy predictions with kernel ridge regression.基于核岭回归的分子轨道能量预测中的化学多样性
J Chem Phys. 2019 May 28;150(20):204121. doi: 10.1063/1.5086105.
4
FCHL revisited: Faster and more accurate quantum machine learning.重新审视FCHL:更快、更准确的量子机器学习。
J Chem Phys. 2020 Jan 31;152(4):044107. doi: 10.1063/1.5126701.
5
Application of Symmetry Functions to Large Chemical Spaces Using a Convolutional Neural Network.对称函数在卷积神经网络中对大化学空间的应用。
J Chem Inf Model. 2020 Apr 27;60(4):1928-1935. doi: 10.1021/acs.jcim.9b00835. Epub 2020 Mar 16.
6
SMICLR: Contrastive Learning on Multiple Molecular Representations for Semisupervised and Unsupervised Representation Learning.SMICLR:基于多种分子表示的对比学习用于半监督和无监督表示学习。
J Chem Inf Model. 2022 Sep 12;62(17):3948-3960. doi: 10.1021/acs.jcim.2c00521. Epub 2022 Aug 31.
7
Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties.在预测有机反应性、选择性和化学性质方面,工程化和学习的分子表示的重要性。
Acc Chem Res. 2021 Feb 16;54(4):827-836. doi: 10.1021/acs.accounts.0c00745. Epub 2021 Feb 3.
8
Comparison Study on the Prediction of Multiple Molecular Properties by Various Neural Networks.各种神经网络对多种分子性质预测的比较研究。
J Phys Chem A. 2018 Nov 21;122(46):9128-9134. doi: 10.1021/acs.jpca.8b09376. Epub 2018 Nov 13.
9
Machine learning prediction of empirical polarity using SMILES encoding of organic solvents.基于有机溶剂 SMILES 编码的机器学习预测经验极性。
Mol Divers. 2023 Oct;27(5):2331-2343. doi: 10.1007/s11030-022-10559-6. Epub 2022 Nov 5.
10
Predicting Molecular Energy Using Force-Field Optimized Geometries and Atomic Vector Representations Learned from an Improved Deep Tensor Neural Network.利用从改进的深度张量神经网络学习到的力场优化几何结构和原子向量表示预测分子能量。
J Chem Theory Comput. 2019 Jul 9;15(7):4113-4121. doi: 10.1021/acs.jctc.9b00001. Epub 2019 Jun 12.

引用本文的文献

1
Equivariant learning leveraging geometric invariances in 3D molecular conformers for accurate prediction of quantum chemical properties.利用3D分子构象中的几何不变性进行等变学习,以准确预测量子化学性质。
Sci Rep. 2025 Jul 24;15(1):26969. doi: 10.1038/s41598-025-09842-x.
2
Machine Learning Classification of Chirality and Optical Rotation Using a Simple One-Hot Encoded Cartesian Coordinate Molecular Representation.使用简单的单热编码笛卡尔坐标分子表示法对手性和旋光性进行机器学习分类
J Chem Inf Model. 2025 May 12;65(9):4281-4292. doi: 10.1021/acs.jcim.4c02374. Epub 2025 May 1.
3
Machine learning applications for thermochemical and kinetic property prediction.
用于热化学和动力学性质预测的机器学习应用。
Rev Chem Eng. 2024 Nov 29;41(4):419-449. doi: 10.1515/revce-2024-0027. eCollection 2025 May.
4
Identifying High-Quality Leads among Screened Anticancerous Compounds Using SMILES Representations.使用SMILES表示法在筛选出的抗癌化合物中识别高质量先导化合物。
ACS Omega. 2024 Jun 28;9(28):30645-30653. doi: 10.1021/acsomega.4c02801. eCollection 2024 Jul 16.
5
QupKake: Integrating Machine Learning and Quantum Chemistry for Micro-p Predictions.QupKake:将机器学习与量子化学相结合用于微观预测。
J Chem Theory Comput. 2024 Aug 13;20(15):6946-6956. doi: 10.1021/acs.jctc.4c00328. Epub 2024 Jun 4.
6
Improving chemical reaction yield prediction using pre-trained graph neural networks.使用预训练的图神经网络改进化学反应产率预测
J Cheminform. 2024 Mar 1;16(1):25. doi: 10.1186/s13321-024-00818-z.
7
Extended study on atomic featurization in graph neural networks for molecular property prediction.用于分子性质预测的图神经网络中原子特征化的扩展研究。
J Cheminform. 2023 Sep 19;15(1):81. doi: 10.1186/s13321-023-00751-7.
8
Deep Neural Networks for Predicting Single-Cell Responses and Probability Landscapes.深度神经网络用于预测单细胞反应和概率景观。
ACS Synth Biol. 2023 Aug 18;12(8):2367-2381. doi: 10.1021/acssynbio.3c00203. Epub 2023 Jul 19.
9
Molecular excited states through a machine learning lens.机器学习视角下的分子激发态
Nat Rev Chem. 2021 Jun;5(6):388-405. doi: 10.1038/s41570-021-00278-1. Epub 2021 May 20.
10
Extending machine learning beyond interatomic potentials for predicting molecular properties.将机器学习应用于超越原子间势的领域,以预测分子性质。
Nat Rev Chem. 2022 Sep;6(9):653-672. doi: 10.1038/s41570-022-00416-3. Epub 2022 Aug 25.