Research Center for Drug Discovery, School of Pharmaceutical Sciences, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou 510006, China.
Key Laboratory of Theoretical Chemistry of Environment, Ministry of Education, School of Chemistry and Environment, South China Normal University, South Circle at University City, Guangzhou 510006, China.
J Chem Inf Model. 2020 Mar 23;60(3):1165-1174. doi: 10.1021/acs.jcim.9b00929. Epub 2020 Feb 17.
The copper(I)-catalyzed alkyne-azide cycloaddition (CuAAC) reaction, a major click chemistry reaction, is widely employed in drug discovery and chemical biology. However, the success rate of the CuAAC reaction is not satisfactory as expected, and in order to improve its performance, we developed a recurrent neural network (RNN) model to predict its feasibility. First, we designed and synthesized a structurally diverse library of 700 compounds with the CuAAC reaction to obtain experimental data. Then, using reaction SMILES as input, we generated a bidirectional long-short-term memory with a self-attention mechanism (BiLSTM-SA) model. Our best prediction model has total accuracy of 80%. With the self-attention mechanism, adverse substructures responsible for negative reactions were recognized and derived as quantitative descriptors. Density functional theory investigations were conducted to provide evidence for the correlation between bromo-α-C hybrid types and the success rate of the reaction. Quantitative descriptors combined with RDKit descriptors were fed to three machine learning models, a support vector machine, random forest, and logistic regression, and resulted in improved performance. The BiLSTM-SA model for predicting the feasibility of the CuAAC reaction is superior to other conventional learning methods and advances heuristic chemical rules.
铜(I)催化的炔烃-叠氮化物环加成(CuAAC)反应是一种主要的点击化学反应,广泛应用于药物发现和化学生物学中。然而,CuAAC 反应的成功率并不如预期的那样令人满意,为了提高其性能,我们开发了一个递归神经网络(RNN)模型来预测其可行性。首先,我们设计并合成了一个具有 700 个化合物的结构多样的库,进行 CuAAC 反应以获得实验数据。然后,我们使用反应 SMILES 作为输入,生成了一个带有自注意力机制的双向长短期记忆(BiLSTM-SA)模型。我们最好的预测模型的总准确率为 80%。通过自注意力机制,识别出导致负反应的不利亚结构,并将其推导为定量描述符。密度泛函理论研究为溴-α-C 混合类型与反应成功率之间的相关性提供了证据。定量描述符与 RDKit 描述符一起输入到三个机器学习模型中,即支持向量机、随机森林和逻辑回归,从而提高了性能。用于预测 CuAAC 反应可行性的 BiLSTM-SA 模型优于其他传统学习方法,并推进了启发式化学规则。