Yamaguchi Shigeru, Nishimura Takahiro, Hibe Yuta, Nagai Masaki, Sato Hirofumi, Johnston Ian
RIKEN Center for Sustainable Resource Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
Department of Chemistry, Graduate School of Science, Kyoto University, Sakyo-ku, Kyoto, 606-8502, Japan.
J Comput Chem. 2017 Jun 5;38(21):1825-1833. doi: 10.1002/jcc.24791. Epub 2017 Mar 27.
In organic chemistry, Comparative Molecular Field Analysis (CoMFA) can be defined as a regression analysis between reaction outcomes and molecular fields, wherein we can extract and visualize important structural information from the coefficients of the constructed regression models. In CoMFA, partial least-squares (PLS) regression, which determines all coefficients in the model, is used for fitting the regression models. However, in organic reactions, steric effects are observed only near the reactive site, indicating that a large number of regression coefficients in the CoMFA of organic reactions should be assigned as 0. The regularized regression method, LASSO/Elastic Net, allows us to fit the regression model while assigning 0 values to unimportant coefficients. Although LASSO/Elastic Net should be suitable for CoMFA, there is no example of its use for organic reaction analysis. Herein, we examine the performance of LASSO/Elastic Net for the quantification of steric effects in CoMFA. We employ digitized molecular structures (the indicator field) as molecular fields that represent steric effects. LASSO/Elastic Net regressions provide highly interpretable models that include less noise than those from PLS regression. © 2017 Wiley Periodicals, Inc.
在有机化学中,比较分子场分析(CoMFA)可定义为反应结果与分子场之间的回归分析,通过这种分析我们可以从构建的回归模型系数中提取并可视化重要的结构信息。在CoMFA中,用于确定模型中所有系数的偏最小二乘(PLS)回归被用于拟合回归模型。然而,在有机反应中,空间效应仅在反应位点附近观察到,这表明有机反应CoMFA中的大量回归系数应设为0。正则化回归方法LASSO/弹性网络允许我们在为不重要的系数赋予0值的同时拟合回归模型。尽管LASSO/弹性网络应该适用于CoMFA,但尚无其用于有机反应分析的实例。在此,我们研究LASSO/弹性网络在CoMFA中量化空间效应的性能。我们采用数字化分子结构(指示场)作为代表空间效应的分子场。LASSO/弹性网络回归提供了高度可解释的模型,其包含的噪声比PLS回归模型更少。© 2017威利期刊公司