Suppr超能文献

采用竞争自适应重加权采样法进行多元校正的关键波长筛选

Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration.

作者信息

Li Hongdong, Liang Yizeng, Xu Qingsong, Cao Dongsheng

机构信息

Research Center of Modernization of Traditional Chinese Medicines, College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China.

出版信息

Anal Chim Acta. 2009 Aug 19;648(1):77-84. doi: 10.1016/j.aca.2009.06.046. Epub 2009 Jun 24.

Abstract

By employing the simple but effective principle 'survival of the fittest' on which Darwin's Evolution Theory is based, a novel strategy for selecting an optimal combination of key wavelengths of multi-component spectral data, named competitive adaptive reweighted sampling (CARS), is developed. Key wavelengths are defined as the wavelengths with large absolute coefficients in a multivariate linear regression model, such as partial least squares (PLS). In the present work, the absolute values of regression coefficients of PLS model are used as an index for evaluating the importance of each wavelength. Then, based on the importance level of each wavelength, CARS sequentially selects N subsets of wavelengths from N Monte Carlo (MC) sampling runs in an iterative and competitive manner. In each sampling run, a fixed ratio (e.g. 80%) of samples is first randomly selected to establish a calibration model. Next, based on the regression coefficients, a two-step procedure including exponentially decreasing function (EDF) based enforced wavelength selection and adaptive reweighted sampling (ARS) based competitive wavelength selection is adopted to select the key wavelengths. Finally, cross validation (CV) is applied to choose the subset with the lowest root mean square error of CV (RMSECV). The performance of the proposed procedure is evaluated using one simulated dataset together with one near infrared dataset of two properties. The results reveal an outstanding characteristic of CARS that it can usually locate an optimal combination of some key wavelengths which are interpretable to the chemical property of interest. Additionally, our study shows that better prediction is obtained by CARS when compared to full spectrum PLS modeling, Monte Carlo uninformative variable elimination (MC-UVE) and moving window partial least squares regression (MWPLSR).

摘要

通过运用达尔文进化论所基于的简单却有效的“适者生存”原则,开发了一种用于选择多组分光谱数据关键波长最优组合的新策略,即竞争自适应重加权采样(CARS)。关键波长被定义为多元线性回归模型(如偏最小二乘法(PLS))中具有大绝对值系数的波长。在本研究中,PLS模型回归系数的绝对值被用作评估每个波长重要性的指标。然后,基于每个波长的重要性水平,CARS以迭代和竞争的方式从N次蒙特卡罗(MC)采样运行中依次选择N个波长子集。在每次采样运行中,首先随机选择固定比例(如80%)的样本以建立校准模型。接下来,基于回归系数,采用包括基于指数递减函数(EDF)的强制波长选择和基于自适应重加权采样(ARS)的竞争波长选择的两步程序来选择关键波长。最后,应用交叉验证(CV)来选择具有最低交叉验证均方根误差(RMSECV)的子集。使用一个模拟数据集以及一个具有两种性质的近红外数据集对所提出程序的性能进行评估。结果揭示了CARS的一个突出特性,即它通常能够找到一些对感兴趣化学性质具有可解释性的关键波长的最优组合。此外,我们的研究表明,与全光谱PLS建模、蒙特卡罗无信息变量消除(MC-UVE)和移动窗口偏最小二乘回归(MWPLSR)相比,CARS能获得更好的预测结果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验