Hefei National Laboratory for Physical Sciences at the Microscale, CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Clean Energy, Center for Excellence in Molecular Synthesis of CAS, Institute of Energy, Hefei Comprehensive National Science Center, University of Science and Technology of China, Hefei 230026, China.
Department of Computer Science, City University of Hong Kong, Hong Kong 999077, China.
J Chem Inf Model. 2022 Sep 26;62(18):4319-4328. doi: 10.1021/acs.jcim.2c00696. Epub 2022 Sep 12.
The quantitative description between chemical reaction rates and nucleophilicity parameters plays a crucial role in organic chemistry. In this regard, the formula proposed by Mayr et al. and the constructed reactivity database are important representatives. However, the determination of Mayr's nucleophilicity parameter often requires time-consuming experiments with reference electrophiles in the solvent. Several machine learning (ML)-based models have been proposed to realize the data-driven prediction of in recent years. However, in addition to DFT-calculated electronic descriptors, most of them also use a set of artificially predefined structural descriptors as input, which may result in a biased representation of the nucleophile's structural information depending on descriptors' definition preference. Compared with traditional ML algorithms, graph neural networks (GNNs) can naturally take the molecule's structural information into account by applying the message passing technique. We herein proposed a SchNet-based GNN model that only takes the molecular conformation and solvent type as input. The model achieves a comparable performance to the previous benchmark study on 10-fold cross-validation of 894 data points ( = 0.91, RMSE = 2.25). To enhance the model's ability to capture the molecule's electronic information, some DFT-calculated parameters are then incorporated into the model via graph global features, and substantial improvement is achieved in the prediction precision ( = 0.95, RMSE = 1.63). These results demonstrate that both structural and electronic information are important for the prediction of , and GNN can integrate these two kinds of information more effectively.
化学反应速率与亲核性参数之间的定量描述在有机化学中起着至关重要的作用。在这方面,Mayr 等人提出的公式和构建的反应性数据库是重要的代表。然而,确定 Mayr 的亲核性参数 通常需要使用参考亲电试剂在溶剂中进行耗时的实验。近年来,已经提出了几种基于机器学习 (ML) 的模型来实现 的数据驱动预测。然而,除了 DFT 计算的电子描述符外,它们中的大多数还将一组人为定义的结构描述符作为输入,这可能会根据描述符的定义偏好导致亲核体结构信息的表示存在偏差。与传统的 ML 算法相比,图神经网络 (GNN) 可以通过应用消息传递技术自然地考虑分子的结构信息。我们在此提出了一个基于 SchNet 的 GNN 模型,该模型仅将分子构象和溶剂类型作为输入。该模型在 894 个数据点的 10 倍交叉验证中的性能与先前的基准研究相当( = 0.91,RMSE = 2.25)。为了增强模型捕获分子电子信息的能力,然后通过图全局特征将一些 DFT 计算的参数纳入模型中,在预测精度方面取得了实质性的提高( = 0.95,RMSE = 1.63)。这些结果表明,结构和电子信息对于 的预测都很重要,并且 GNN 可以更有效地整合这两种信息。