Li Zhuo, Zhao Changquan, Wang Haikun, Ding Yanqing, Chen Yechao, Schwaller Philippe, Yang Ke, Hua Cheng, He Yulian
University of Michigan-Shanghai Jiao Tong University Joint Institute, Shanghai Jiao Tong University, Shanghai 200240, China.
School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.
Proc Natl Acad Sci U S A. 2024 Mar 19;121(12):e2320232121. doi: 10.1073/pnas.2320232121. Epub 2024 Mar 13.
The chemisorption energy of reactants on a catalyst surface, [Formula: see text], is among the most informative characteristics of understanding and pinpointing the optimal catalyst. The intrinsic complexity of catalyst surfaces and chemisorption reactions presents significant difficulties in identifying the pivotal physical quantities determining [Formula: see text]. In response to this, the study proposes a methodology, the feature deletion experiment, based on Automatic Machine Learning (AutoML) for knowledge extraction from a high-throughput density functional theory (DFT) database. The study reveals that, for binary alloy surfaces, the local adsorption site geometric information is the primary physical quantity determining [Formula: see text], compared to the electronic and physiochemical properties of the catalyst alloys. By integrating the feature deletion experiment with instance-wise variable selection (INVASE), a neural network-based explainable AI (XAI) tool, we established the best-performing feature set containing 21 intrinsic, non-DFT computed properties, achieving an MAE of 0.23 eV across a periodic table-wide chemical space involving more than 1,600 types of alloys surfaces and 8,400 chemisorption reactions. This study demonstrates the stability, consistency, and potential of AutoML-based feature deletion experiment in developing concise, predictive, and theoretically meaningful models for complex chemical problems with minimal human intervention.
反应物在催化剂表面的化学吸附能[公式:见原文]是理解和确定最佳催化剂的最具信息量的特征之一。催化剂表面和化学吸附反应的内在复杂性给识别决定[公式:见原文]的关键物理量带来了重大困难。针对这一问题,该研究提出了一种基于自动机器学习(AutoML)的方法——特征删除实验,用于从高通量密度泛函理论(DFT)数据库中提取知识。研究表明,对于二元合金表面,与催化剂合金的电子和物理化学性质相比,局部吸附位点的几何信息是决定[公式:见原文]的主要物理量。通过将特征删除实验与基于实例的变量选择(INVASE,一种基于神经网络的可解释人工智能(XAI)工具)相结合,我们建立了性能最佳的特征集,其中包含21个内在的、非DFT计算的属性,在涉及1600多种合金表面类型和8400个化学吸附反应的全周期化学空间中实现了0.23 eV的平均绝对误差(MAE)。这项研究证明了基于AutoML的特征删除实验在以最少的人为干预为复杂化学问题开发简洁、预测性强且具有理论意义的模型方面的稳定性、一致性和潜力。