• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于遗传算法的偏最小二乘法:仅使用第一主成分进行模型解释

Genetic Algorithm-Based Partial Least-Squares with Only the First Component for Model Interpretation.

作者信息

Kaneko Hiromasa

机构信息

Department of Applied Chemistry, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan.

出版信息

ACS Omega. 2022 Mar 4;7(10):8968-8979. doi: 10.1021/acsomega.1c07379. eCollection 2022 Mar 15.

DOI:10.1021/acsomega.1c07379
PMID:35309472
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8928558/
Abstract

In the fields of molecular design, material design, process design, and process control, it is important not only to construct models with high predictive ability between explanatory variables and objective variables but also to interpret the constructed models to clarify phenomena and elucidate mechanisms in the fields. However, even in linear models, it is dangerous to use regression coefficients as contributions of to due to multicollinearity among . Thus, the focus of this study is the model of partial least-squares with only the first component (PLSFC). It is possible to use regression coefficients as contributions of to for the PLSFC model. In addition, selecting the combination of that can construct a predictive PLSFC model using a genetic algorithm (GA) is proposed, which is called GA-based PLSFC (GA-PLSFC). The constructed model would have both high predictive ability and high interpretability with regression coefficients that can be defined as contributions of to . The effectiveness of the proposed PLSFC and GA-PLSFC is verified using numerically simulated data sets and real material data sets. The proposed method was found to be capable of constructing predictive models with high interpretability. The Python codes for GA-PLSFC are available at https://github.com/hkaneko1985/dcekit.

摘要

在分子设计、材料设计、工艺设计和过程控制领域,不仅要构建解释变量与目标变量之间具有高预测能力的模型,还要对构建的模型进行解释,以阐明该领域的现象和机制。然而,即使在线性模型中,由于解释变量之间存在多重共线性,将回归系数用作解释变量对目标变量的贡献也是危险的。因此,本研究的重点是仅具有第一成分的偏最小二乘模型(PLSFC)。对于PLSFC模型,可以将回归系数用作解释变量对目标变量的贡献。此外,提出了使用遗传算法(GA)选择能够构建预测性PLSFC模型的解释变量组合,这被称为基于GA的PLSFC(GA-PLSFC)。构建的模型将具有高预测能力和高可解释性,其回归系数可定义为解释变量对目标变量的贡献。使用数值模拟数据集和实际材料数据集验证了所提出的PLSFC和GA-PLSFC的有效性。发现所提出的方法能够构建具有高可解释性的预测模型。GA-PLSFC的Python代码可在https://github.com/hkaneko1985/dcekit获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/4437dcb6ddcc/ao1c07379_0019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/f2aecda86724/ao1c07379_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/5f79f35e3aaf/ao1c07379_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/969575a0da98/ao1c07379_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/ab54489b573a/ao1c07379_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/ecf736198a02/ao1c07379_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/709ff03f7f4e/ao1c07379_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/7472da726c5c/ao1c07379_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/2a5a67c6157d/ao1c07379_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/e46ff21fa5cb/ao1c07379_0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/80a712b0bf3b/ao1c07379_0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/0d62939eec1d/ao1c07379_0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/dae181dfd6b8/ao1c07379_0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/c284e7f19f74/ao1c07379_0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/4eb35a9ff41d/ao1c07379_0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/15cddff3cfef/ao1c07379_0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/78321c68f54a/ao1c07379_0016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/33708d1c83ab/ao1c07379_0017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/0f56b365381e/ao1c07379_0018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/4437dcb6ddcc/ao1c07379_0019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/f2aecda86724/ao1c07379_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/5f79f35e3aaf/ao1c07379_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/969575a0da98/ao1c07379_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/ab54489b573a/ao1c07379_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/ecf736198a02/ao1c07379_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/709ff03f7f4e/ao1c07379_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/7472da726c5c/ao1c07379_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/2a5a67c6157d/ao1c07379_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/e46ff21fa5cb/ao1c07379_0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/80a712b0bf3b/ao1c07379_0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/0d62939eec1d/ao1c07379_0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/dae181dfd6b8/ao1c07379_0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/c284e7f19f74/ao1c07379_0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/4eb35a9ff41d/ao1c07379_0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/15cddff3cfef/ao1c07379_0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/78321c68f54a/ao1c07379_0016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/33708d1c83ab/ao1c07379_0017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/0f56b365381e/ao1c07379_0018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ccc/8928558/4437dcb6ddcc/ao1c07379_0019.jpg

相似文献

1
Genetic Algorithm-Based Partial Least-Squares with Only the First Component for Model Interpretation.基于遗传算法的偏最小二乘法:仅使用第一主成分进行模型解释
ACS Omega. 2022 Mar 4;7(10):8968-8979. doi: 10.1021/acsomega.1c07379. eCollection 2022 Mar 15.
2
Cross-validated permutation feature importance considering correlation between features.考虑特征间相关性的交叉验证排列特征重要性
Anal Sci Adv. 2022 Sep 7;3(9-10):278-287. doi: 10.1002/ansa.202200018. eCollection 2022 Oct.
3
Improved variable reduction in partial least squares modelling based on predictive-property-ranked variables and adaptation of partial least squares complexity.基于预测属性排序变量的偏最小二乘建模中的变量减少改进和偏最小二乘复杂度的自适应。
Anal Chim Acta. 2011 Oct 31;705(1-2):292-305. doi: 10.1016/j.aca.2011.06.037. Epub 2011 Jun 29.
4
Evaluation and Optimization Methods for Applicability Domain Methods and Their Hyperparameters, Considering the Prediction Performance of Machine Learning Models.考虑机器学习模型预测性能的适用域方法及其超参数的评估与优化方法
ACS Omega. 2024 Feb 26;9(10):11453-11458. doi: 10.1021/acsomega.3c08036. eCollection 2024 Mar 12.
5
Chemometrics-assisted simultaneous voltammetric determination of ascorbic acid, uric acid, dopamine and nitrite: application of non-bilinear voltammetric data for exploiting first-order advantage.化学计量学辅助同时伏安法测定抗坏血酸、尿酸、多巴胺和亚硝酸盐:利用非双线性伏安数据发挥一阶优势的应用
Talanta. 2014 Feb;119:553-63. doi: 10.1016/j.talanta.2013.11.028. Epub 2013 Nov 27.
6
Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables.研究回归模型预测性能的变量选择方法以及所选变量和所选随机变量的比例。
Heliyon. 2021 Jun 18;7(6):e07356. doi: 10.1016/j.heliyon.2021.e07356. eCollection 2021 Jun.
7
Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets.随机森林和线性模型在基准数据集上的预测性能与可解释性比较
J Chem Inf Model. 2017 Aug 28;57(8):1773-1792. doi: 10.1021/acs.jcim.6b00753. Epub 2017 Aug 2.
8
Development of a new regression analysis method using independent component analysis.
J Chem Inf Model. 2008 Mar;48(3):534-41. doi: 10.1021/ci700245f. Epub 2008 Mar 6.
9
Data Visualization, Regression, Applicability Domains and Inverse Analysis Based on Generative Topographic Mapping.基于生成式地形映射的数据可视化、回归、适用域和反分析。
Mol Inform. 2019 Mar;38(3):e1800088. doi: 10.1002/minf.201800088. Epub 2018 Sep 27.
10
Interpretation of Machine Learning Models for Data Sets with Many Features Using Feature Importance.使用特征重要性对具有多个特征的数据集的机器学习模型进行解释。
ACS Omega. 2023 Jun 14;8(25):23218-23225. doi: 10.1021/acsomega.3c03722. eCollection 2023 Jun 27.

引用本文的文献

1
Batch Process Design Including Initial and Operating Conditions and Online Property Estimation in Acrylic Resin Polymerization.丙烯酸树脂聚合中的间歇过程设计,包括初始条件、操作条件及在线性质估算
ACS Omega. 2025 Jun 2;10(23):24555-24559. doi: 10.1021/acsomega.5c01274. eCollection 2025 Jun 17.
2
RegGAN: A Virtual Sample Generative Network for Developing Soft Sensors with Small Data.RegGAN:一种用于利用小数据开发软传感器的虚拟样本生成网络。
ACS Omega. 2024 Jan 24;9(5):5954-5965. doi: 10.1021/acsomega.3c09762. eCollection 2024 Feb 6.
3
Selection of the Effective Characteristic Spectra Based on the Chemical Structure and Its Application in Rapid Analysis of Ethanol Content in Gasoline.

本文引用的文献

1
A review on genetic algorithm: past, present, and future.关于遗传算法的综述:过去、现在与未来。
Multimed Tools Appl. 2021;80(5):8091-8126. doi: 10.1007/s11042-020-10139-6. Epub 2020 Oct 31.
2
Deep learning for computational chemistry.用于计算化学的深度学习
J Comput Chem. 2017 Jun 15;38(16):1291-1307. doi: 10.1002/jcc.24764. Epub 2017 Mar 8.
3
Gradient boosting machines, a tutorial.梯度提升机,教程。
基于化学结构的有效特征光谱选择及其在汽油中乙醇含量快速分析中的应用
ACS Omega. 2022 May 30;7(23):20291-20297. doi: 10.1021/acsomega.2c02282. eCollection 2022 Jun 14.
Front Neurorobot. 2013 Dec 4;7:21. doi: 10.3389/fnbot.2013.00021. eCollection 2013.
4
Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection.LASSO 相关惩罚回归方法在数量性状定位和基因组选择中的概述。
Theor Appl Genet. 2012 Aug;125(3):419-35. doi: 10.1007/s00122-012-1892-9. Epub 2012 May 24.
5
Sparse partial least squares regression for simultaneous dimension reduction and variable selection.用于同时进行降维和变量选择的稀疏偏最小二乘回归。
J R Stat Soc Series B Stat Methodol. 2010 Jan;72(1):3-25. doi: 10.1111/j.1467-9868.2009.00723.x.
6
Contemporary QSAR classifiers compared.当代定量构效关系分类器比较。
J Chem Inf Model. 2007 Jan-Feb;47(1):219-27. doi: 10.1021/ci600332j.
7
Random forest models to predict aqueous solubility.用于预测水溶性的随机森林模型。
J Chem Inf Model. 2007 Jan-Feb;47(1):150-8. doi: 10.1021/ci060164k.
8
Multivariate NIR spectroscopy models for moisture, ash and calorific content in biofuels using bi-orthogonal partial least squares regression.使用双正交偏最小二乘回归的生物燃料中水分、灰分和热值的多元近红外光谱模型。
Analyst. 2005 Aug;130(8):1182-9. doi: 10.1039/b500103j. Epub 2005 Jun 29.
9
ADME evaluation in drug discovery. 4. Prediction of aqueous solubility based on atom contribution approach.药物发现中的ADME评估。4. 基于原子贡献法的水溶性预测。
J Chem Inf Comput Sci. 2004 Jan-Feb;44(1):266-75. doi: 10.1021/ci034184n.