Chung Jihoon, Li Justin, Saimon Amirul Islam, Hong Pengyu, Kong Zhenyu
Department of Industrial Engineering, Pusan National University, Busan, Korea.
Management, Entrepreneurship, and Technology, University of California, Berkeley, CA, USA.
Sci Rep. 2024 May 27;14(1):12131. doi: 10.1038/s41598-024-62158-0.
Stereoselective reactions have played a vital role in the emergence of life, evolution, human biology, and medicine. However, for a long time, most industrial and academic efforts followed a trial-and-error approach for asymmetric synthesis in stereoselective reactions. In addition, most previous studies have been qualitatively focused on the influence of steric and electronic effects on stereoselective reactions. Therefore, quantitatively understanding the stereoselectivity of a given chemical reaction is extremely difficult. As proof of principle, this paper develops a novel composite machine learning method for quantitatively predicting the enantioselectivity representing the degree to which one enantiomer is preferentially produced from the reactions. Specifically, machine learning methods that are widely used in data analytics, including Random Forest, Support Vector Regression, and LASSO, are utilized. In addition, the Bayesian optimization and permutation importance tests are provided for an in-depth understanding of reactions and accurate prediction. Finally, the proposed composite method approximates the key features of the available reactions by using Gaussian mixture models, which provide suitable machine learning methods for new reactions. The case studies using the real stereoselective reactions show that the proposed method is effective and provides a solid foundation for further application to other chemical reactions.
立体选择性反应在生命起源、进化、人类生物学和医学中发挥了至关重要的作用。然而,长期以来,大多数工业和学术研究在立体选择性反应的不对称合成中都采用试错法。此外,以往的大多数研究在定性上都集中于空间和电子效应对立体选择性反应的影响。因此,定量理解给定化学反应的立体选择性极其困难。作为原理证明,本文开发了一种新型的复合机器学习方法,用于定量预测对映选择性,该对映选择性表示从反应中优先生成一种对映体的程度。具体而言,使用了在数据分析中广泛应用的机器学习方法,包括随机森林、支持向量回归和套索回归。此外,还提供了贝叶斯优化和排列重要性测试,以深入理解反应并进行准确预测。最后,所提出的复合方法通过使用高斯混合模型来逼近现有反应的关键特征,这为新反应提供了合适的机器学习方法。使用实际立体选择性反应的案例研究表明,所提出的方法是有效的,并为进一步应用于其他化学反应奠定了坚实基础。