Zhang Jixiong, Yan Hong, Xiong Yanmei, Li Qianqian, Min Shungeng
College of Science, China Agricultural University No. 2, Yuanmingyuanxi Road, Haidian District Beijing 100193 P.R. China
School of Marine Science, China University of Geosciences in Beijing Beijing 100086 China.
RSC Adv. 2019 Feb 26;9(12):6708-6716. doi: 10.1039/c8ra08754g. eCollection 2019 Feb 22.
Wavelength selection is a critical factor for pattern recognition of vibrational spectroscopic data. Not only does it alleviate the effect of dimensionality on an algorithm's generalization performance, but it also enhances the understanding and interpretability of multivariate classification models. In this study, a novel partial least squares discriminant analysis (PLSDA)-based wavelength selection algorithm, termed ensemble of bootstrapping space shrinkage (EBSS), has been devised for vibrational spectroscopic data analysis. In the algorithm, a set of subsets are generated from a data set using random sampling. For an individual subset, a feature space is determined by maximizing the expected 10-fold cross-validation accuracy with a weighted bootstrap sampling strategy. Then an ensemble strategy and a sequential forward selection method are applied to the feature spaces to select characteristic variables. Experimental results obtained from analysis of real vibrational spectroscopic data sets demonstrate that the ensemble wavelength selection algorithm can reserve stable and informative variables for the final modeling and improve predictive ability for multivariate classification models.
波长选择是振动光谱数据模式识别的关键因素。它不仅能减轻维度对算法泛化性能的影响,还能增强多元分类模型的可理解性和可解释性。在本研究中,一种基于偏最小二乘判别分析(PLSDA)的新型波长选择算法——自展空间收缩集成算法(EBSS)被设计用于振动光谱数据分析。在该算法中,通过随机抽样从数据集中生成一组子集。对于单个子集,采用加权自展抽样策略,通过最大化期望的10折交叉验证准确率来确定特征空间。然后将集成策略和顺序向前选择方法应用于这些特征空间以选择特征变量。对实际振动光谱数据集的分析结果表明,该集成波长选择算法能够为最终建模保留稳定且信息丰富的变量,并提高多元分类模型的预测能力。