School of Food and Biological Engineering, Jiangsu University, Zhenjiang 212013, China.
Anal Chim Acta. 2010 May 14;667(1-2):14-32. doi: 10.1016/j.aca.2010.03.048. Epub 2010 Mar 30.
Near-infrared (NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields, such as the petrochemical, pharmaceutical, environmental, clinical, agricultural, food and biomedical sectors during the past 15 years. A NIR spectrum of a sample is typically measured by modern scanning instruments at hundreds of equally spaced wavelengths. The large number of spectral variables in most data sets encountered in NIR spectral chemometrics often renders the prediction of a dependent variable unreliable. Recently, considerable effort has been directed towards developing and evaluating different procedures that objectively identify variables which contribute useful information and/or eliminate variables containing mostly noise. This review focuses on the variable selection methods in NIR spectroscopy. Selection methods include some classical approaches, such as manual approach (knowledge based selection), "Univariate" and "Sequential" selection methods; sophisticated methods such as successive projections algorithm (SPA) and uninformative variable elimination (UVE), elaborate search-based strategies such as simulated annealing (SA), artificial neural networks (ANN) and genetic algorithms (GAs) and interval base algorithms such as interval partial least squares (iPLS), windows PLS and iterative PLS. Wavelength selection with B-spline, Kalman filtering, Fisher's weights and Bayesian are also mentioned. Finally, the websites of some variable selection software and toolboxes for non-commercial use are given.
近红外(NIR)光谱分析在过去的 15 年中已越来越多地被应用于各个领域,例如石化、制药、环境、临床、农业、食品和生物医学领域,作为一种分析工具。现代扫描仪器通常以数百个等间隔的波长来测量样品的近红外光谱。在 NIR 光谱化学计量学中遇到的大多数数据集通常包含大量的光谱变量,这使得预测因变量变得不可靠。最近,人们已经做出了相当大的努力,开发和评估不同的程序,以客观地识别出具有有用信息的变量和/或消除主要包含噪声的变量。这篇综述重点介绍了 NIR 光谱中的变量选择方法。选择方法包括一些经典方法,例如手动方法(基于知识的选择)、“单变量”和“顺序”选择方法;复杂的方法,如连续投影算法(SPA)和无信息变量消除(UVE)、精心设计的基于搜索的策略,如模拟退火(SA)、人工神经网络(ANN)和遗传算法(GA),以及基于区间的算法,如区间偏最小二乘(iPLS)、窗口 PLS 和迭代 PLS。还提到了 B 样条、卡尔曼滤波、Fisher 权重和贝叶斯的波长选择。最后,给出了一些免费的变量选择软件和工具箱的网站。