Suppr超能文献

机器学习驯服发散的密度泛函近似:达成一致的材料设计原则的新途径。

Machine learning to tame divergent density functional approximations: a new path to consensus materials design principles.

作者信息

Duan Chenru, Chen Shuxin, Taylor Michael G, Liu Fang, Kulik Heather J

机构信息

Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA

Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA.

出版信息

Chem Sci. 2021 Sep 2;12(39):13021-13036. doi: 10.1039/d1sc03701c. eCollection 2021 Oct 13.

Abstract

Virtual high-throughput screening (VHTS) with density functional theory (DFT) and machine-learning (ML)-acceleration is essential in rapid materials discovery. By necessity, efficient DFT-based workflows are carried out with a single density functional approximation (DFA). Nevertheless, properties evaluated with different DFAs can be expected to disagree for cases with challenging electronic structure (, open-shell transition-metal complexes, TMCs) for which rapid screening is most needed and accurate benchmarks are often unavailable. To quantify the effect of DFA bias, we introduce an approach to rapidly obtain property predictions from 23 representative DFAs spanning multiple families, "rungs" (, semi-local to double hybrid) and basis sets on over 2000 TMCs. Although computed property values (, spin state splitting and frontier orbital gap) differ by DFA, high linear correlations persist across all DFAs. We train independent ML models for each DFA and observe convergent trends in feature importance, providing DFA-invariant, universal design rules. We devise a strategy to train artificial neural network (ANN) models informed by all 23 DFAs and use them to predict properties (, spin-splitting energy) of over 187k TMCs. By requiring consensus of the ANN-predicted DFA properties, we improve correspondence of computational lead compounds with literature-mined, experimental compounds over the typically employed single-DFA approach.

摘要

采用密度泛函理论(DFT)和机器学习(ML)加速的虚拟高通量筛选(VHTS)对于快速发现材料至关重要。出于必要,基于DFT的高效工作流程是在单一密度泛函近似(DFA)下进行的。然而,对于具有挑战性的电子结构(如开壳过渡金属配合物,TMCs)的情况,预计使用不同DFA评估的性质会存在差异,而这些情况恰恰是最需要快速筛选且通常缺乏准确基准的。为了量化DFA偏差的影响,我们引入了一种方法,可从跨越多个族、“梯级”(从半局域到双杂化)的23种代表性DFA以及超过2000种TMCs的基组中快速获得性质预测。尽管计算得到的性质值(如自旋态分裂和前沿轨道间隙)因DFA而异,但所有DFA之间仍存在高度线性相关性。我们为每个DFA训练独立的ML模型,并观察特征重要性的收敛趋势,从而提供与DFA无关的通用设计规则。我们设计了一种策略,训练受所有23种DFA启发的人工神经网络(ANN)模型,并使用它们来预测超过18.7万个TMCs的性质(如自旋分裂能)。通过要求ANN预测的DFA性质达成共识,我们比通常采用的单DFA方法提高了计算先导化合物与文献挖掘的实验化合物之间的对应性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7c4/8513898/312659e7c881/d1sc03701c-f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验