• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

应用大型图神经网络使用 tmQM_wB97MV 数据集预测过渡金属配合物能量。

Applying Large Graph Neural Networks to Predict Transition Metal Complex Energies Using the tmQM_wB97MV Data Set.

机构信息

Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States.

Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States.

出版信息

J Chem Inf Model. 2023 Dec 25;63(24):7642-7654. doi: 10.1021/acs.jcim.3c01226. Epub 2023 Dec 4.

DOI:10.1021/acs.jcim.3c01226
PMID:38049389
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10751796/
Abstract

Machine learning (ML) methods have shown promise for discovering novel catalysts but are often restricted to specific chemical domains. Generalizable ML models require large and diverse training data sets, which exist for heterogeneous catalysis but not for homogeneous catalysis. The tmQM data set, which contains properties of 86,665 transition metal complexes calculated at the TPSSh/def2-SVP level of density functional theory (DFT), provided a promising training data set for homogeneous catalyst systems. However, we find that ML models trained on tmQM consistently underpredict the energies of a chemically distinct subset of the data. To address this, we present the tmQM_wB97MV data set, which filters out several structures in tmQM found to be missing hydrogens and recomputes the energies of all other structures at the ωB97M-V/def2-SVPD level of DFT. ML models trained on tmQM_wB97MV show no pattern of consistently incorrect predictions and much lower errors than those trained on tmQM. The ML models tested on tmQM_wB97MV were, from best to worst, GemNet-T > PaiNN ≈ SpinConv > SchNet. Performance consistently improves when using only neutral structures instead of the entire data set. However, while models saturate with only neutral structures, more data continue to improve the models when including charged species, indicating the importance of accurately capturing a range of oxidation states in future data generation and model development. Furthermore, a fine-tuning approach in which weights were initialized from models trained on OC20 led to drastic improvements in model performance, indicating transferability between ML strategies of heterogeneous and homogeneous systems.

摘要

机器学习 (ML) 方法在发现新型催化剂方面显示出了潜力,但通常仅限于特定的化学领域。可推广的 ML 模型需要大型和多样化的训练数据集,这些数据集存在于多相催化中,但不存在于均相催化中。包含 86665 个过渡金属配合物在 TPSSh/def2-SVP 密度泛函理论 (DFT) 水平下计算的性质的 tmQM 数据集为均相催化剂系统提供了一个很有前途的训练数据集。然而,我们发现,在 tmQM 上训练的 ML 模型始终低估了数据中一个化学上不同子集的能量。为了解决这个问题,我们提出了 tmQM_wB97MV 数据集,它过滤掉了 tmQM 中几个被发现缺少氢的结构,并在 ωB97M-V/def2-SVPD 水平的 DFT 上重新计算了所有其他结构的能量。在 tmQM_wB97MV 上训练的 ML 模型没有一致错误预测的模式,而且比在 tmQM 上训练的模型误差要小得多。在 tmQM_wB97MV 上测试的 ML 模型从最好到最差依次是 GemNet-T > PaiNN ≈ SpinConv > SchNet。仅使用中性结构而不是整个数据集时,性能始终会提高。然而,虽然模型仅在使用中性结构时就会饱和,但当包括带电物种时,更多的数据会继续提高模型的性能,这表明在未来的数据生成和模型开发中准确捕捉一系列氧化态的重要性。此外,一种微调方法,其中权重是从在 OC20 上训练的模型初始化的,导致模型性能的急剧提高,这表明了在多相和均相系统的 ML 策略之间存在可转移性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/021f9457fcc5/ci3c01226_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/2b954256ebe6/ci3c01226_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/d1eab39f7559/ci3c01226_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/af03c318a393/ci3c01226_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/a7364a718e08/ci3c01226_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/b34ca2975f02/ci3c01226_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/bf0319cb3ee4/ci3c01226_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/c35940653061/ci3c01226_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/021f9457fcc5/ci3c01226_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/2b954256ebe6/ci3c01226_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/d1eab39f7559/ci3c01226_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/af03c318a393/ci3c01226_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/a7364a718e08/ci3c01226_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/b34ca2975f02/ci3c01226_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/bf0319cb3ee4/ci3c01226_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/c35940653061/ci3c01226_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1443/10751796/021f9457fcc5/ci3c01226_0008.jpg

相似文献

1
Applying Large Graph Neural Networks to Predict Transition Metal Complex Energies Using the tmQM_wB97MV Data Set.应用大型图神经网络使用 tmQM_wB97MV 数据集预测过渡金属配合物能量。
J Chem Inf Model. 2023 Dec 25;63(24):7642-7654. doi: 10.1021/acs.jcim.3c01226. Epub 2023 Dec 4.
2
tmQM Dataset-Quantum Geometries and Properties of 86k Transition Metal Complexes.tmQM 数据集-86k 过渡金属配合物的量子几何和性质。
J Chem Inf Model. 2020 Dec 28;60(12):6135-6146. doi: 10.1021/acs.jcim.0c01041. Epub 2020 Nov 9.
3
MABAL: a Novel Deep-Learning Architecture for Machine-Assisted Bone Age Labeling.MABAL:一种用于机器辅助骨龄标注的新型深度学习架构。
J Digit Imaging. 2018 Aug;31(4):513-519. doi: 10.1007/s10278-018-0053-3.
4
Machine Learning Frontier Orbital Energies of Nanodiamonds.机器学习前沿:纳米金刚石的轨道能量
J Chem Theory Comput. 2023 Jul 25;19(14):4461-4473. doi: 10.1021/acs.jctc.2c01275. Epub 2023 Apr 13.
5
Machine Learning Models Predict Calculation Outcomes with the Transferability Necessary for Computational Catalysis.机器学习模型通过计算催化所必需的可转移性来预测计算结果。
J Chem Theory Comput. 2022 Jul 12;18(7):4282-4292. doi: 10.1021/acs.jctc.2c00331. Epub 2022 Jun 23.
6
Transfer learning using attentions across atomic systems with graph neural networks (TAAG).基于图神经网络的原子体系注意力迁移学习(TAAG)。
J Chem Phys. 2022 May 14;156(18):184702. doi: 10.1063/5.0088019.
7
Navigating Transition-Metal Chemical Space: Artificial Intelligence for First-Principles Design.探索过渡金属化学空间:基于第一性原理设计的人工智能
Acc Chem Res. 2021 Feb 2;54(3):532-545. doi: 10.1021/acs.accounts.0c00686. Epub 2021 Jan 22.
8
Raman Spectra of Amino Acids and Peptides from Machine Learning Polarizabilities.氨基酸和肽的拉曼光谱:来自机器学习极化率。
J Chem Inf Model. 2024 Jun 24;64(12):4601-4612. doi: 10.1021/acs.jcim.4c00077. Epub 2024 Jun 3.
9
Hybrid DFT Geometries and Properties for 17k Lanthanoid Complexes─The LnQM Data Set.17000个镧系元素配合物的混合密度泛函理论几何结构和性质——LnQM数据集
J Chem Inf Model. 2024 Feb 12;64(3):825-836. doi: 10.1021/acs.jcim.3c01832. Epub 2024 Jan 18.
10
Photoredox matching of earth-abundant photosensitizers with hydrogen evolving catalysts by first-principles predictions.通过第一性原理预测实现地壳丰富的光敏剂与析氢催化剂的光氧化还原匹配。
J Chem Phys. 2024 Feb 21;160(7). doi: 10.1063/5.0174837.

引用本文的文献

1
A Deep Generative Model for the Inverse Design of Transition Metal Ligands and Complexes.用于过渡金属配体和配合物逆向设计的深度生成模型
JACS Au. 2025 Apr 23;5(5):2294-2308. doi: 10.1021/jacsau.5c00242. eCollection 2025 May 26.
2
Homogeneous catalyst graph neural network: A human-interpretable graph neural network tool for ligand optimization in asymmetric catalysis.均相催化剂图神经网络:一种用于不对称催化中配体优化的可人工解释的图神经网络工具。
iScience. 2025 Jan 23;28(3):111881. doi: 10.1016/j.isci.2025.111881. eCollection 2025 Mar 21.
3
Data Checking of Asymmetric Catalysis Literature Using a Graph Neural Network Approach.

本文引用的文献

1
Evaluation of the MACE force field architecture: From medicinal chemistry to materials science.MACE力场架构评估:从药物化学到材料科学。
J Chem Phys. 2023 Jul 28;159(4). doi: 10.1063/5.0155322.
2
Characterizing Uncertainty in Machine Learning for Chemistry.机器学习在化学中的不确定性描述。
J Chem Inf Model. 2023 Jul 10;63(13):4012-4029. doi: 10.1021/acs.jcim.3c00373. Epub 2023 Jun 20.
3
Best-Practice DFT Protocols for Basic Molecular Computational Chemistry.基础分子计算化学的最佳实践密度泛函理论协议
使用图神经网络方法对不对称催化文献进行数据检查
Molecules. 2025 Jan 16;30(2):355. doi: 10.3390/molecules30020355.
4
Automated prediction of ground state spin for transition metal complexes.过渡金属配合物基态自旋的自动预测。
Digit Discov. 2024 Jul 12;3(8):1638-1647. doi: 10.1039/d4dd00093e. eCollection 2024 Aug 7.
Angew Chem Int Ed Engl. 2022 Oct 17;61(42):e202205735. doi: 10.1002/anie.202205735. Epub 2022 Sep 14.
4
Transfer learning using attentions across atomic systems with graph neural networks (TAAG).基于图神经网络的原子体系注意力迁移学习(TAAG)。
J Chem Phys. 2022 May 14;156(18):184702. doi: 10.1063/5.0088019.
5
Comprehensive Basis-Set Testing of Extended Symmetry-Adapted Perturbation Theory and Assessment of Mixed-Basis Combinations to Reduce Cost.扩展对称性自适应微扰理论的综合基组测试和降低成本的混合基组合评估。
J Chem Theory Comput. 2022 Apr 12;18(4):2308-2330. doi: 10.1021/acs.jctc.1c01302. Epub 2022 Mar 15.
6
Coupled Cluster Benchmark of New DFT and Local Correlation Methods: Mechanisms of Hydroarylation and Oxidative Coupling Catalyzed by Ru(II, III) Chloride Carbonyls.新型密度泛函理论(DFT)和局域相关方法的耦合簇基准测试:Ru(II, III) 羰基氯化物催化的氢芳基化和氧化偶联反应机理
J Phys Chem A. 2021 Oct 14;125(40):8987-8999. doi: 10.1021/acs.jpca.1c05124. Epub 2021 Sep 29.
7
Software for the frontiers of quantum chemistry: An overview of developments in the Q-Chem 5 package.量子化学前沿软件:Q-Chem 5软件包的发展综述
J Chem Phys. 2021 Aug 28;155(8):084801. doi: 10.1063/5.0055522.
8
Quantum chemical calculations of lithium-ion battery electrolyte and interphase species.锂离子电池电解质和界面物种的量子化学计算
Sci Data. 2021 Aug 5;8(1):203. doi: 10.1038/s41597-021-00986-9.
9
tmQM Dataset-Quantum Geometries and Properties of 86k Transition Metal Complexes.tmQM 数据集-86k 过渡金属配合物的量子几何和性质。
J Chem Inf Model. 2020 Dec 28;60(12):6135-6146. doi: 10.1021/acs.jcim.0c01041. Epub 2020 Nov 9.
10
Mapping Materials and Molecules.绘制材料和分子图谱。
Acc Chem Res. 2020 Sep 15;53(9):1981-1991. doi: 10.1021/acs.accounts.0c00403. Epub 2020 Aug 14.