Kammeraad Joshua A, Goetz Jack, Walker Eric A, Tewari Ambuj, Zimmerman Paul M
Department of Chemistry, University of Michigan, 930 North University Avenue, Ann Arbor, Michigan 48109, United States.
Department of Statistics, University of Michigan, 1085 South University Avenue, Ann Arbor, Michigan 48109, United States.
J Chem Inf Model. 2020 Mar 23;60(3):1290-1301. doi: 10.1021/acs.jcim.9b00721. Epub 2020 Mar 3.
In a departure from conventional chemical approaches, data-driven models of chemical reactions have recently been shown to be statistically successful using machine learning. These models, however, are largely black box in character and have not provided the kind of chemical insights that historically advanced the field of chemistry. To examine the knowledgebase of machine-learning models-what does the machine learn-this article deconstructs black-box machine-learning models of a diverse chemical reaction data set. Through experimentation with chemical representations and modeling techniques, the analysis provides insights into the nature of how statistical accuracy can arise, even when the model lacks informative physical principles. By peeling back the layers of these complicated models we arrive at a minimal, chemically intuitive model (and no machine learning involved). This model is based on systematic reaction-type classification and Evans-Polanyi relationships within reaction types which are easily visualized and interpreted. Through exploring this simple model, we gain deeper understanding of the data set and uncover a means for expert interactions to improve the model's reliability.
与传统化学方法不同,最近数据驱动的化学反应模型已被证明在使用机器学习时具有统计学上的成功。然而,这些模型在很大程度上具有黑箱性质,并未提供历史上推动化学领域发展的那种化学见解。为了审视机器学习模型的知识库——机器学到了什么——本文解构了一个多样化化学反应数据集的黑箱机器学习模型。通过对化学表示和建模技术进行实验,该分析深入了解了即使模型缺乏信息性物理原理时统计准确性如何产生的本质。通过剥开这些复杂模型的层层外衣,我们得出了一个简单的、具有化学直观性的模型(且不涉及机器学习)。该模型基于系统的反应类型分类以及反应类型内的埃文斯 - 波拉尼关系,这些关系易于可视化和解释。通过探索这个简单模型,我们对数据集有了更深入的理解,并发现了专家交互以提高模型可靠性的一种方法。