Chen Lung-Yi, Li Yi-Pei
Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 10617, Taiwan.
Taiwan International Graduate Program on Sustainable Chemical Science and Technology (TIGP-SCST), No. 128, Sec. 2, Academia Road, Taipei, 11529, Taiwan.
J Cheminform. 2024 Jan 24;16(1):11. doi: 10.1186/s13321-024-00805-4.
In the field of chemical synthesis planning, the accurate recommendation of reaction conditions is essential for achieving successful outcomes. This work introduces an innovative deep learning approach designed to address the complex task of predicting appropriate reagents, solvents, and reaction temperatures for chemical reactions. Our proposed methodology combines a multi-label classification model with a ranking model to offer tailored reaction condition recommendations based on relevance scores derived from anticipated product yields. To tackle the challenge of limited data for unfavorable reaction contexts, we employed the technique of hard negative sampling to generate reaction conditions that might be mistakenly classified as suitable, forcing the model to refine its decision boundaries, especially in challenging cases. Our developed model excels in proposing conditions where an exact match to the recorded solvents and reagents is found within the top-10 predictions 73% of the time. It also predicts temperatures within ± 20 [Formula: see text] of the recorded temperature in 89% of test cases. Notably, the model demonstrates its capacity to recommend multiple viable reaction conditions, with accuracy varying based on the availability of condition records associated with each reaction. What sets this model apart is its ability to suggest alternative reaction conditions beyond the constraints of the dataset. This underscores its potential to inspire innovative approaches in chemical research, presenting a compelling opportunity for advancing chemical synthesis planning and elevating the field of reaction engineering. Scientific contribution: The combination of multi-label classification and ranking models provides tailored recommendations for reaction conditions based on the reaction yields. A novel approach is presented to address the issue of data scarcity in negative reaction conditions through data augmentation.
在化学合成规划领域,准确推荐反应条件对于取得成功结果至关重要。这项工作引入了一种创新的深度学习方法,旨在解决预测化学反应合适试剂、溶剂和反应温度这一复杂任务。我们提出的方法将多标签分类模型与排序模型相结合,根据预期产物收率得出的相关性分数提供定制的反应条件推荐。为应对不利反应情境下数据有限的挑战,我们采用了硬负采样技术来生成可能被错误分类为合适的反应条件,迫使模型细化其决策边界,尤其是在具有挑战性的情况下。我们开发的模型在提出条件方面表现出色,在前10个预测中有73%的时间能找到与记录的溶剂和试剂完全匹配的情况。在89%的测试案例中,它还能将温度预测在记录温度的±20[公式:见原文]范围内。值得注意的是,该模型展示了推荐多种可行反应条件的能力,其准确性因与每个反应相关的条件记录的可用性而异。该模型的独特之处在于它能够在数据集的限制之外建议替代反应条件。这凸显了其在化学研究中激发创新方法的潜力,为推进化学合成规划和提升反应工程领域提供了一个引人注目的机会。科学贡献:多标签分类和排序模型的结合基于反应产率为反应条件提供定制推荐。提出了一种通过数据增强来解决负面反应条件下数据稀缺问题的新方法。