将半经验哈密顿量视为灵活的机器学习模型可产生准确且可解释的结果。

Treating Semiempirical Hamiltonians as Flexible Machine Learning Models Yields Accurate and Interpretable Results.

作者信息

Hu Frank, He Francis, Yaron David J

机构信息

Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States.

出版信息

J Chem Theory Comput. 2023 Sep 26;19(18):6185-6196. doi: 10.1021/acs.jctc.3c00491. Epub 2023 Sep 13.

DOI:10.1021/acs.jctc.3c00491

PMID:37705220

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10536991/

Abstract

Quantum chemistry provides chemists with invaluable information, but the high computational cost limits the size and type of systems that can be studied. Machine learning (ML) has emerged as a means to dramatically lower the cost while maintaining high accuracy. However, ML models often sacrifice interpretability by using components such as the artificial neural networks of deep learning that function as black boxes. These components impart the flexibility needed to learn from large volumes of data but make it difficult to gain insight into the physical or chemical basis for the predictions. Here, we demonstrate that semiempirical quantum chemical (SEQC) models can learn from large volumes of data without sacrificing interpretability. The SEQC model is that of density-functional-based tight binding (DFTB) with fixed atomic orbital energies and interactions that are one-dimensional functions of the interatomic distance. This model is trained to data in a manner that is analogous to that used to train deep learning models. Using benchmarks that reflect the accuracy of the training data, we show that the resulting model maintains a physically reasonable functional form while achieving an accuracy, relative to coupled cluster energies with a complete basis set extrapolation (CCSD(T)*/CBS), that is comparable to that of density functional theory (DFT). This suggests that trained SEQC models can achieve a low computational cost and high accuracy without sacrificing interpretability. Use of a physically motivated model form also substantially reduces the amount of data needed to train the model compared to that required for deep learning models.

摘要

量子化学为化学家提供了极有价值的信息，但高昂的计算成本限制了可研究体系的规模和类型。机器学习（ML）已成为一种在保持高精度的同时大幅降低成本的手段。然而，ML模型通常会通过使用深度学习中的人工神经网络等作为黑箱的组件来牺牲可解释性。这些组件赋予了从大量数据中学习所需的灵活性，但却难以深入了解预测的物理或化学基础。在此，我们证明半经验量子化学（SEQC）模型可以在不牺牲可解释性的情况下从大量数据中学习。SEQC模型是基于密度泛函的紧束缚（DFTB）模型，其具有固定的原子轨道能量和作为原子间距离一维函数的相互作用。该模型以类似于训练深度学习模型的方式对数据进行训练。使用反映训练数据准确性的基准，我们表明所得模型在保持物理上合理的函数形式的同时，相对于具有完整基组外推的耦合簇能量（CCSD(T)*/CBS），实现了与密度泛函理论（DFT）相当的准确性。这表明经过训练的SEQC模型可以在不牺牲可解释性的情况下实现低计算成本和高精度。与深度学习模型相比，使用具有物理动机的模型形式还大幅减少了训练模型所需的数据量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02ff/10536991/50ee0bf08f63/ct3c00491_0001.jpg

相似文献

Treating Semiempirical Hamiltonians as Flexible Machine Learning Models Yields Accurate and Interpretable Results.

J Chem Theory Comput. 2023 Sep 26;19(18):6185-6196. doi: 10.1021/acs.jctc.3c00491. Epub 2023 Sep 13.

Deep learning of dynamically responsive chemical Hamiltonians with semiempirical quantum mechanics.

Proc Natl Acad Sci U S A. 2022 Jul 5;119(27):e2120333119. doi: 10.1073/pnas.2120333119. Epub 2022 Jul 1.

Calculations on noncovalent interactions and databases of benchmark interaction energies.

Acc Chem Res. 2012 Apr 17;45(4):663-72. doi: 10.1021/ar200255p. Epub 2012 Jan 6.

Accurate Many-Body Repulsive Potentials for Density-Functional Tight Binding from Deep Tensor Neural Networks.

J Phys Chem Lett. 2020 Aug 20;11(16):6835-6843. doi: 10.1021/acs.jpclett.0c01307. Epub 2020 Aug 7.

Ab Initio Calculations for Molecule-Surface Interactions with Chemical Accuracy.

Acc Chem Res. 2019 Dec 17;52(12):3502-3510. doi: 10.1021/acs.accounts.9b00506. Epub 2019 Nov 25.

A machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databases.

J Cheminform. 2016 May 3;8:24. doi: 10.1186/s13321-016-0133-7. eCollection 2016.

High-Accuracy Semiempirical Quantum Models Based on a Minimal Training Set.

J Phys Chem Lett. 2022 Apr 7;13(13):2934-2942. doi: 10.1021/acs.jpclett.2c00453. Epub 2022 Mar 28.

Canonical and explicitly-correlated coupled cluster correlation energies of sub-kJ mol accuracy cost-effective hybrid-post-CBS extrapolation.

Phys Chem Chem Phys. 2021 Apr 22;23(15):9571-9584. doi: 10.1039/d1cp00357g.

Interpretable machine learning models for hospital readmission prediction: a two-step extracted regression tree approach.

BMC Med Inform Decis Mak. 2023 Jun 5;23(1):104. doi: 10.1186/s12911-023-02193-5.

Machine Learning Enhanced DFTB Method for Periodic Systems: Learning from Electronic Density of States.

J Chem Theory Comput. 2023 Jul 11;19(13):3877-3888. doi: 10.1021/acs.jctc.3c00152. Epub 2023 Jun 23.

引用本文的文献

Cross-disciplinary perspectives on the potential for artificial intelligence across chemistry.

Chem Soc Rev. 2025 Apr 25. doi: 10.1039/d5cs00146c.

Data Generation for Machine Learning Interatomic Potentials and Beyond.

Chem Rev. 2024 Dec 25;124(24):13681-13714. doi: 10.1021/acs.chemrev.4c00572. Epub 2024 Nov 21.

Efficient Parameterization of Density Functional Tight-Binding for 5-Elements: A Th-O Case Study.

J Chem Theory Comput. 2024 Jul 23;20(14):5923-5936. doi: 10.1021/acs.jctc.4c00145. Epub 2024 Jul 11.

OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials.

J Phys Chem B. 2024 Jan 11;128(1):109-116. doi: 10.1021/acs.jpcb.3c06662. Epub 2023 Dec 28.

本文引用的文献

Exploring chemical compound space with quantum-based machine learning.

Nat Rev Chem. 2020 Jul;4(7):347-358. doi: 10.1038/s41570-020-0189-9. Epub 2020 Jun 12.

TBMaLT, a flexible toolkit for combining tight-binding and machine learning.

J Chem Phys. 2023 Jan 21;158(3):034801. doi: 10.1063/5.0132892.

Obtaining Electronic Properties of Molecules through Combining Density Functional Tight Binding with Machine Learning.

J Phys Chem Lett. 2022 Nov 3;13(43):10132-10139. doi: 10.1021/acs.jpclett.2c02586. Epub 2022 Oct 21.

A quantum chemical molecular dynamics repository of solvated ions.

Sci Data. 2022 Jul 21;9(1):430. doi: 10.1038/s41597-022-01527-8.

Deep learning of dynamically responsive chemical Hamiltonians with semiempirical quantum mechanics.

Proc Natl Acad Sci U S A. 2022 Jul 5;119(27):e2120333119. doi: 10.1073/pnas.2120333119. Epub 2022 Jul 1.

Artificial intelligence-enhanced quantum chemical method with broad applicability.

Nat Commun. 2021 Dec 2;12(1):7022. doi: 10.1038/s41467-021-27340-2.

OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy.

J Chem Phys. 2021 Nov 28;155(20):204103. doi: 10.1063/5.0061990.

Deep Learning Coordinate-Free Quantum Chemistry.

J Phys Chem A. 2021 Oct 14;125(40):8978-8986. doi: 10.1021/acs.jpca.1c04462. Epub 2021 Oct 5.

Machine Learning Force Fields: Recent Advances and Remaining Challenges.

J Phys Chem Lett. 2021 Jul 22;12(28):6551-6564. doi: 10.1021/acs.jpclett.1c01204. Epub 2021 Jul 9.

MLatom 2: An Integrative Platform for Atomistic Machine Learning.

Top Curr Chem (Cham). 2021 Jun 8;379(4):27. doi: 10.1007/s41061-021-00339-5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

将半经验哈密顿量视为灵活的机器学习模型可产生准确且可解释的结果。

Treating Semiempirical Hamiltonians as Flexible Machine Learning Models Yields Accurate and Interpretable Results.

作者信息

Hu Frank, He Francis, Yaron David J

机构信息

Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States.

出版信息

J Chem Theory Comput. 2023 Sep 26;19(18):6185-6196. doi: 10.1021/acs.jctc.3c00491. Epub 2023 Sep 13.

DOI:10.1021/acs.jctc.3c00491

PMID:37705220

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10536991/

Abstract

摘要

将半经验哈密顿量视为灵活的机器学习模型可产生准确且可解释的结果。

Treating Semiempirical Hamiltonians as Flexible Machine Learning Models Yields Accurate and Interpretable Results.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

将半经验哈密顿量视为灵活的机器学习模型可产生准确且可解释的结果。

Treating Semiempirical Hamiltonians as Flexible Machine Learning Models Yields Accurate and Interpretable Results.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献