Singh Sukriti, Sunoj Raghavan B
Department of Chemistry, Indian Institute of Technology Bombay, Mumbai 400076, India.
Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Mumbai 400076, India.
iScience. 2022 Jun 22;25(7):104661. doi: 10.1016/j.isci.2022.104661. eCollection 2022 Jul 15.
Sustainable practices in chemical sciences can be better realized by adopting interdisciplinary approaches that combine the advantages of machine learning (ML) on the initially acquired small data in reaction discovery. Developing new reactions generally remains heuristic and even time and resource intensive. For instance, synthesis of fluorine-containing compounds, which constitute ∼20% of the marketed drugs, relies on deoxyfluorination of abundantly available alcohols. Herein, we demonstrate the use of a recurrent neural network-based deep generative model built on a library of just 37 alcohols for effective learning and exploration of the chemical space. The proof-of-concept ML model is able to generate good quality, synthetically accessible, higher-yielding novel alcohol molecules. This protocol would have superior utility for deployment into a practical reaction discovery pipeline.
通过采用跨学科方法,结合机器学习(ML)在反应发现中最初获取的小数据方面的优势,可以更好地实现化学科学中的可持续实践。开发新反应通常仍然是试探性的,甚至耗费时间和资源。例如,构成约20%市售药物的含氟化合物的合成依赖于大量可得醇的脱氧氟化反应。在此,我们展示了基于仅37种醇的库构建的基于循环神经网络的深度生成模型,用于有效学习和探索化学空间。这个概念验证的ML模型能够生成高质量、可合成获得、产率更高的新型醇分子。该方案在部署到实际反应发现流程中将具有卓越的实用性。