Suppr超能文献

基于核的量子机器学习创纪录速度:多体分布泛函作为紧凑表示。

Kernel based quantum machine learning at record rate: Many-body distribution functionals as compact representations.

机构信息

Department of Chemistry, University of Toronto, St. George Campus, Toronto, Ontario M5S 1A1, Canada.

Vector Institute for Artificial Intelligence, Toronto, Ontario M5S 1M1, Canada.

出版信息

J Chem Phys. 2023 Jul 21;159(3). doi: 10.1063/5.0152215.

Abstract

The feature vector mapping used to represent chemical systems is a key factor governing the superior data efficiency of kernel based quantum machine learning (QML) models applicable throughout chemical compound space. Unfortunately, the most accurate representations require a high dimensional feature mapping, thereby imposing a considerable computational burden on model training and use. We introduce compact yet accurate, linear scaling QML representations based on atomic Gaussian many-body distribution functionals (MBDF) and their derivatives. Weighted density functions of MBDF values are used as global representations that are constant in size, i.e., invariant with respect to the number of atoms. We report predictive performance and training data efficiency that is competitive with state-of-the-art for two diverse datasets of organic molecules, QM9 and QMugs. Generalization capability has been investigated for atomization energies, highest occupied molecular orbital-lowest unoccupied molecular orbital eigenvalues and gap, internal energies at 0 K, zero point vibrational energies, dipole moment norm, static isotropic polarizability, and heat capacity as encoded in QM9. MBDF based QM9 performance lowers the optimal Pareto front spanned between sampling and training cost to compute node minutes, effectively sampling chemical compound space with chemical accuracy at a sampling rate of ∼48 molecules per core second.

摘要

用于表示化学系统的特征向量映射是控制基于核的量子机器学习 (QML) 模型在整个化学化合物空间中具有优越数据效率的关键因素。不幸的是,最准确的表示形式需要高维特征映射,从而对模型训练和使用造成相当大的计算负担。我们引入了紧凑而准确的、线性比例的 QML 表示形式,基于原子高斯多体分布函数 (MBDF) 及其导数。MBDF 值的加权密度函数被用作全局表示,其大小是恒定的,即相对于原子数是不变的。我们报告了对于两个不同的有机分子数据集 QM9 和 QMugs 的预测性能和训练数据效率,与最先进的技术具有竞争力。我们研究了原子化能、最高占据分子轨道-最低未占据分子轨道本征值和能隙、0 K 时的内能、零点振动能、偶极矩范数、静态各向同性极化率和热容的泛化能力,这些都是在 QM9 中编码的。基于 MBDF 的 QM9 性能将计算节点分钟之间的采样和训练成本的最优 Pareto 前沿降低到有效以 ∼48 个分子/核秒的采样率在化学精度上对化学化合物空间进行采样。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验