研究用于激发能的多保真机器学习中的数据层次结构。

Investigating Data Hierarchies in Multifidelity Machine Learning for Excitation Energies.

作者信息

Vinod Vivin, Zaspel Peter

机构信息

School of Mathematics and Natural Sciences, University of Wuppertal, Wuppertal 42119 Germany.

出版信息

J Chem Theory Comput. 2025 Mar 25;21(6):3077-3091. doi: 10.1021/acs.jctc.4c01491. Epub 2025 Mar 13.

DOI:10.1021/acs.jctc.4c01491

PMID:40079624

Abstract

Recent progress in machine learning (ML) has made high-accuracy quantum chemistry (QC) calculations more accessible. Of particular interest are multifidelity machine learning (MFML) methods, where training data from differing accuracies or fidelities are used. These methods usually employ a fixed scaling factor, γ, to relate the number of training samples across different fidelities, which reflects the cost and assumed sparsity of the data. This study investigates the impact of modifying γ on model efficiency and accuracy for the prediction of vertical excitation energies using the QeMFi benchmark data set. Further, this work introduces QC compute time-informed scaling factors, denoted as θ, that vary based on QC compute times at different fidelities. A novel error metric, error contours of MFML, is proposed to provide a comprehensive view of model error contributions from each fidelity. The results indicate that high model accuracy can be achieved with just 2 training samples at the target fidelity when a larger number of samples from lower fidelities are used. This is further illustrated through a novel concept, the Γ-curve, which compares model error against the time-cost of generating training samples, demonstrating that multifidelity models can achieve high accuracy while minimizing training data costs.

摘要

机器学习（ML）的最新进展使高精度量子化学（QC）计算变得更加容易实现。特别值得关注的是多保真度机器学习（MFML）方法，该方法使用来自不同精度或保真度的训练数据。这些方法通常采用固定的缩放因子γ来关联不同保真度下的训练样本数量，这反映了数据的成本和假设的稀疏性。本研究使用QeMFi基准数据集，研究修改γ对垂直激发能预测模型效率和准确性的影响。此外，这项工作引入了基于QC计算时间的缩放因子θ，该因子根据不同保真度下的QC计算时间而变化。提出了一种新的误差度量——MFML误差等高线，以全面了解每个保真度对模型误差的贡献。结果表明，当使用大量来自较低保真度的样本时，在目标保真度下仅用2个训练样本就能实现较高的模型精度。通过一个新颖的概念——Γ曲线进一步说明了这一点，该曲线将模型误差与生成训练样本的时间成本进行比较，表明多保真度模型可以在最小化训练数据成本的同时实现高精度。

相似文献

Investigating Data Hierarchies in Multifidelity Machine Learning for Excitation Energies.研究用于激发能的多保真机器学习中的数据层次结构。

J Chem Theory Comput. 2025 Mar 25;21(6):3077-3091. doi: 10.1021/acs.jctc.4c01491. Epub 2025 Mar 13.

QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules.QeMFi：多种分子量子化学性质的多保真度数据集。

Sci Data. 2025 Feb 3;12(1):202. doi: 10.1038/s41597-024-04247-3.

Multifidelity Machine Learning for Molecular Excitation Energies.用于分子激发能的多保真机器学习

J Chem Theory Comput. 2023 Nov 14;19(21):7658-7670. doi: 10.1021/acs.jctc.3c00882. Epub 2023 Oct 20.

Predicting Molecular Energies of Small Organic Molecules With Multi-Fidelity Methods.用多保真度方法预测小有机分子的分子能量

J Comput Chem. 2025 Mar 5;46(6):e70056. doi: 10.1002/jcc.70056.

Multifidelity Information Fusion with Machine Learning: A Case Study of Dopant Formation Energies in Hafnia.基于机器学习的多保真信息融合：以氧化铪中掺杂剂形成能为例的研究

ACS Appl Mater Interfaces. 2019 Jul 17;11(28):24906-24918. doi: 10.1021/acsami.9b02174. Epub 2019 Apr 16.

Data-Efficient Multifidelity Training for High-Fidelity Machine Learning Interatomic Potentials.用于高保真机器学习原子间势的数据高效多保真训练

J Am Chem Soc. 2025 Jan 8;147(1):1042-1054. doi: 10.1021/jacs.4c14455. Epub 2024 Dec 17.

Multifidelity Neural Network Formulations for Prediction of Reactive Molecular Potential Energy Surfaces.多保真度神经网络在反应分子势能面预测中的应用。

J Chem Inf Model. 2023 Apr 24;63(8):2281-2295. doi: 10.1021/acs.jcim.2c01617. Epub 2023 Apr 12.

Quasi-Classical Trajectory Calculation of Rate Constants Using an Ab Initio Trained Machine Learning Model (aML-MD) with Multifidelity Data.使用具有多保真度数据的从头训练机器学习模型（aML-MD）进行速率常数的准经典轨迹计算。

J Phys Chem A. 2024 May 2;128(17):3449-3457. doi: 10.1021/acs.jpca.4c00750. Epub 2024 Apr 20.

Multilevel and multifidelity uncertainty quantification for cardiovascular hemodynamics.心血管血液动力学的多尺度和多保真度不确定性量化

Comput Methods Appl Mech Eng. 2020 Jun 15;365. doi: 10.1016/j.cma.2020.113030. Epub 2020 Apr 21.

Multifidelity regression of sparse plasma transport data available in disparate physical regimes.不同物理状态下可用稀疏等离子体输运数据的多保真回归。

Phys Rev E. 2021 Dec;104(6-2):065303. doi: 10.1103/PhysRevE.104.065303.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

研究用于激发能的多保真机器学习中的数据层次结构。

Investigating Data Hierarchies in Multifidelity Machine Learning for Excitation Energies.

作者信息

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献