Kaneko Hiromasa
Department of Applied Chemistry, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan.
ACS Omega. 2022 Mar 4;7(10):8968-8979. doi: 10.1021/acsomega.1c07379. eCollection 2022 Mar 15.
In the fields of molecular design, material design, process design, and process control, it is important not only to construct models with high predictive ability between explanatory variables and objective variables but also to interpret the constructed models to clarify phenomena and elucidate mechanisms in the fields. However, even in linear models, it is dangerous to use regression coefficients as contributions of to due to multicollinearity among . Thus, the focus of this study is the model of partial least-squares with only the first component (PLSFC). It is possible to use regression coefficients as contributions of to for the PLSFC model. In addition, selecting the combination of that can construct a predictive PLSFC model using a genetic algorithm (GA) is proposed, which is called GA-based PLSFC (GA-PLSFC). The constructed model would have both high predictive ability and high interpretability with regression coefficients that can be defined as contributions of to . The effectiveness of the proposed PLSFC and GA-PLSFC is verified using numerically simulated data sets and real material data sets. The proposed method was found to be capable of constructing predictive models with high interpretability. The Python codes for GA-PLSFC are available at https://github.com/hkaneko1985/dcekit.
在分子设计、材料设计、工艺设计和过程控制领域,不仅要构建解释变量与目标变量之间具有高预测能力的模型,还要对构建的模型进行解释,以阐明该领域的现象和机制。然而,即使在线性模型中,由于解释变量之间存在多重共线性,将回归系数用作解释变量对目标变量的贡献也是危险的。因此,本研究的重点是仅具有第一成分的偏最小二乘模型(PLSFC)。对于PLSFC模型,可以将回归系数用作解释变量对目标变量的贡献。此外,提出了使用遗传算法(GA)选择能够构建预测性PLSFC模型的解释变量组合,这被称为基于GA的PLSFC(GA-PLSFC)。构建的模型将具有高预测能力和高可解释性,其回归系数可定义为解释变量对目标变量的贡献。使用数值模拟数据集和实际材料数据集验证了所提出的PLSFC和GA-PLSFC的有效性。发现所提出的方法能够构建具有高可解释性的预测模型。GA-PLSFC的Python代码可在https://github.com/hkaneko1985/dcekit获取。