Li Shih-Cheng, Wang Pei-Hua, Su Jheng-Wei, Chiang Wei-Yin, Yeh Tzu-Lan, Zhavoronkov Alex, Huang Shih-Hsien, Lin Yen-Chu, Ou Chia-Ho, Chen Chih-Yu
Insilico Medicine Taiwan Ltd, Suite C830, 8F., No. 563, Sec. 4, Zhongxiao East Road, Xinyi District, Taipei, 110058, Taiwan.
Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, USA.
J Cheminform. 2025 Jul 14;17(1):105. doi: 10.1186/s13321-025-01043-y.
Finding optimal reaction conditions is crucial for chemical synthesis in the pharmaceutical and chemical industries. However, due to the vast chemical space, conducting experiments for all the possible combinations is impractical. Thus, quantitative structure-activity relationship (QSAR) models have been widely used to predict product yields, but evaluating all combinations is still computationally intensive. In this work, we demonstrate the use of Digital Annealer Unit (DAU) can tackle these large-scale optimization problems more efficiently. Two types of models are developed and tested on high-throughput experimentation (HTE) and Reaxys datasets. Our results suggest that the performance of models is comparable to classical machine learning (ML) methods (i.e., Random Forest and Multilayer Perceptron (MLP)), while the inference time of our models requires only seconds with a DAU. In active learning and autonomous reaction condition design, our model shows improvement for reaction yield prediction by incorporating new data, meaning that it can potentially be used in iterative processes. Our method can also accelerate the screening of billions of reaction conditions, achieving speeds millions of times faster than traditional computing units in identifying superior conditions.
寻找最佳反应条件对于制药和化工行业的化学合成至关重要。然而,由于化学空间巨大,对所有可能的组合进行实验是不切实际的。因此,定量构效关系(QSAR)模型已被广泛用于预测产品收率,但评估所有组合在计算上仍然很密集。在这项工作中,我们证明了使用数字退火器单元(DAU)可以更有效地解决这些大规模优化问题。在高通量实验(HTE)和Reaxys数据集上开发并测试了两种类型的模型。我们的结果表明,模型的性能与经典机器学习(ML)方法(即随机森林和多层感知器(MLP))相当,而使用DAU时我们模型的推理时间仅需几秒钟。在主动学习和自主反应条件设计中,我们的模型通过纳入新数据在反应收率预测方面显示出改进,这意味着它有可能用于迭代过程。我们的方法还可以加速对数以十亿计的反应条件的筛选,在识别优越条件方面比传统计算单元快数百万倍。