Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong SAR, China.
Molecules. 2023 Jun 13;28(12):4730. doi: 10.3390/molecules28124730.
Machine learning has revolutionized information processing for large datasets across various fields. However, its limited interpretability poses a significant challenge when applied to chemistry. In this study, we developed a set of simple molecular representations to capture the structural information of ligands in palladium-catalyzed Sonogashira coupling reactions of aryl bromides. Drawing inspiration from human understanding of catalytic cycles, we used a graph neural network to extract structural details of the phosphine ligand, a major contributor to the overall activation energy. We combined these simple molecular representations with an electronic descriptor of aryl bromide as inputs for a fully connected neural network unit. The results allowed us to predict rate constants and gain mechanistic insights into the rate-limiting oxidative addition process using a relatively small dataset. This study highlights the importance of incorporating domain knowledge in machine learning and presents an alternative approach to data analysis.
机器学习已经彻底改变了各个领域中对大型数据集的信息处理方式。然而,当将其应用于化学领域时,其有限的可解释性就成为了一个重大挑战。在这项研究中,我们开发了一组简单的分子表示方法,用于捕获钯催化的芳基溴代物 Sonogashira 偶联反应中配体的结构信息。受人类对催化循环理解的启发,我们使用图神经网络提取膦配体的结构细节,该配体是总活化能的主要贡献者。我们将这些简单的分子表示与芳基溴化物的电子描述符相结合,作为全连接神经网络单元的输入。研究结果使我们能够使用相对较小的数据集预测速率常数,并深入了解限速氧化加成过程的反应机理。这项研究强调了在机器学习中结合领域知识的重要性,并提出了一种数据分析的替代方法。