Suppr超能文献

有效分子描述符用于以 DFT 成本实现化学精度:碎片化、误差消除和机器学习。

Effective Molecular Descriptors for Chemical Accuracy at DFT Cost: Fragmentation, Error-Cancellation, and Machine Learning.

机构信息

Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States.

出版信息

J Chem Theory Comput. 2020 Aug 11;16(8):4938-4950. doi: 10.1021/acs.jctc.0c00236. Epub 2020 Jul 17.

Abstract

Recent advances in theoretical thermochemistry have allowed the study of small organic and bio-organic molecules with high accuracy. However, applications to larger molecules are still impeded by the steep scaling problem of highly accurate quantum mechanical (QM) methods, forcing the use of approximate, more cost-effective methods at a greatly reduced accuracy. One of the most successful strategies to mitigate this error is the use of systematic error-cancellation schemes, in which highly accurate QM calculations can be performed on small portions of the molecule to construct corrections to an approximate method. Herein, we build on ideas from fragmentation and error-cancellation to introduce a new family of molecular descriptors for machine learning modeled after the Connectivity-Based Hierarchy (CBH) of generalized isodesmic reaction schemes. The best performing descriptor ML(CBH-2) is constructed from fragments preserving only the immediate connectivity of all heavy (non-H) atoms of a molecule along with overlapping regions of fragments in accordance with the inclusion-exclusion principle. Our proposed approach offers a simple, chemically intuitive grouping of atoms, tuned with an optimal amount of error-cancellation, and outperforms previous structure-based descriptors using a much smaller input vector length. For a wide variety of density functionals, DFT+ΔML(CBH-2) models, trained on a set of small- to medium-sized organic HCNOSCl-containing molecules, achieved an out-of-sample MAE within 0.5 kcal/mol and 2σ (95%) confidence interval of <1.5 kcal/mol compared to accurate G4 reference values at DFT cost.

摘要

理论热化学的最新进展使得对小型有机和生物有机分子进行高精度研究成为可能。然而,由于高精度量子力学(QM)方法的陡峭扩展问题,这些方法在应用于更大的分子时仍然受到阻碍,这迫使我们使用近似的、成本效益更高的方法,但精度大大降低。缓解这种误差的最成功策略之一是使用系统误差消除方案,其中可以对分子的小部分进行高精度 QM 计算,以构建对近似方法的修正。在此,我们借鉴碎片化和误差消除的思想,引入了一种新的机器学习分子描述符家族,其模型基于广义等电子反应方案的连接性层次结构(CBH)。性能最佳的描述符 ML(CBH-2)是根据包含排除原理,仅从分子中所有重原子(非 H 原子)的直接连接以及片段的重叠区域构建的。我们提出的方法提供了一种简单、具有化学直观性的原子分组方式,通过最佳的误差消除量进行调整,并使用更小的输入向量长度优于以前基于结构的描述符。对于各种密度泛函,DFT+ΔML(CBH-2)模型在一组包含小分子至中等大小的有机 HCNOSCl 分子的训练集上,在 DFT 成本下,与准确的 G4 参考值相比,其模型外 MAE 为 0.5 kcal/mol 以内,2σ(95%置信区间)<1.5 kcal/mol。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验