Suppr超能文献

一种基于变量空间连续收缩的多元校正混合变量选择策略。

A hybrid variable selection strategy based on continuous shrinkage of variable space in multivariate calibration.

机构信息

College of Food Science and Technology, Hainan University, Haikou, 570228, China; Institute of Environment and Plant Protection, Chinese Academy of Tropical Agricultural Sciences, Haikou, 571101, PR China.

College of Tobacco Science, Guizhou University, Guiyang, 550025, China.

出版信息

Anal Chim Acta. 2019 Jun 13;1058:58-69. doi: 10.1016/j.aca.2019.01.022. Epub 2019 Jan 21.

Abstract

When analyzing high-dimensional near-infrared (NIR) spectral datasets, variable selection is critical to improving models' predictive abilities. However, some methods have many limitations, such as a high risk of overfitting, time-intensiveness, or large computation demands, when dealing with a high number of variables. In this study, we propose a hybrid variable selection strategy based on the continuous shrinkage of variable space which is the core idea of variable combination population analysis (VCPA). The VCPA-based hybrid strategy continuously shrinks the variable space from big to small and optimizes it based on modified VCPA in the first step. It then employs iteratively retaining informative variables (IRIV) and a genetic algorithm (GA) to carry out further optimization in the second step. It takes full advantage of VCPA, GA, and IRIV, and makes up for their drawbacks in the face of high numbers of variables. Three NIR datasets and three variable selection methods including two widely-used methods (competitive adaptive reweighted sampling, CARS and genetic algorithm-interval partial least squares, GA-iPLS) and one hybrid method (variable importance in projection coupled with genetic algorithm, VIP-GA) were used to investigate the improvement of VCPA-based hybrid strategy. The results show that VCPA-GA and VCPA-IRIV significantly improve model's prediction performance when compared with other methods, indicating that the modified VCPA step is a very efficient way to filter the uninformative variables and VCPA-based hybrid strategy is a good and promising strategy for variable selection in NIR. The MATLAB source codes of VCPA-GA and VCPA-IRIV can be freely downloaded in the website: https://cn.mathworks.com/matlabcentral/profile/authors/5526470-yonghuan-yun.

摘要

在分析高维近红外(NIR)光谱数据集时,变量选择对于提高模型的预测能力至关重要。然而,当处理大量变量时,一些方法存在许多限制,例如过度拟合的风险高、时间密集或计算需求大。在本研究中,我们提出了一种基于连续收缩变量空间的混合变量选择策略,这是变量组合群体分析(VCPA)的核心思想。基于 VCPA 的混合策略从大到小连续收缩变量空间,并在第一步中基于改进的 VCPA 对其进行优化。然后,它在第二步中采用迭代保留信息变量(IRIV)和遗传算法(GA)进行进一步优化。它充分利用了 VCPA、GA 和 IRIV,并弥补了它们在面对大量变量时的缺点。我们使用三个 NIR 数据集和三种变量选择方法(包括两种广泛使用的方法[竞争自适应重加权采样,CARS 和遗传算法-区间偏最小二乘,GA-iPLS]和一种混合方法[变量重要性投影与遗传算法,VIP-GA])来研究 VCPA 基混合策略的改进。结果表明,与其他方法相比,VCPA-GA 和 VCPA-IRIV 显著提高了模型的预测性能,这表明改进的 VCPA 步骤是过滤无信息变量的非常有效的方法,VCPA 基混合策略是 NIR 中变量选择的一种很好且有前途的策略。VCPA-GA 和 VCPA-IRIV 的 MATLAB 源代码可在以下网站免费下载:https://cn.mathworks.com/matlabcentral/profile/authors/5526470-yonghuan-yun。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验