Wang Qiaoyun, Zou Xin, Chen Yinji, Zhu Ziheng, Yan Chongyue, Shan Peng, Wang Shuyu, Fu Yongqing
College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning Province 110819, China; Hebei Key Laboratory of Micro-Nano Precision Optical Sensing and Measurement Technology, Qinhuangdao 066004, China.
College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning Province 110819, China.
Spectrochim Acta A Mol Biomol Spectrosc. 2024 Dec 15;323:124917. doi: 10.1016/j.saa.2024.124917. Epub 2024 Jul 31.
To improve prediction performance and reduce artifacts in Raman spectra, we developed an eXtreme Gradient Boosting (XGBoost) preprocessing method to preprocess the Raman spectra of glucose, glycerol and ethanol mixtures. To ensure the robustness and reliability of the XGBoost preprocessing method, an X-LR model (which combined XGBoost preprocessing and a linear regression (LR) model) and a X-MLP model (which combined XGBoost preprocessing and a multilayer perceptron (MLP) model) were developed. These two models were used to quantitatively analyze the concentrations of glucose, glycerol and ethanol in the Raman spectra of mixed solutions. The proportion map of hyperparameters was firstly used to narrow down the search range of hyperparameters in the X-LR and the X-MLP models. Then the correlation coefficients (R), root mean square of calibration (RMSEC), and root mean square error of prediction (RMSEP) were used to evaluate the models' performance. Experimental results indicated that the XGBoost preprocessing method achieved higher accuracy and generalization capability, with better performance than those of other preprocessing methods for both LR and MLP models.
为了提高拉曼光谱的预测性能并减少伪影,我们开发了一种极端梯度提升(XGBoost)预处理方法,用于对葡萄糖、甘油和乙醇混合物的拉曼光谱进行预处理。为确保XGBoost预处理方法的稳健性和可靠性,我们开发了一个X-LR模型(结合了XGBoost预处理和线性回归(LR)模型)和一个X-MLP模型(结合了XGBoost预处理和多层感知器(MLP)模型)。这两个模型用于定量分析混合溶液拉曼光谱中葡萄糖、甘油和乙醇的浓度。首先使用超参数比例图来缩小X-LR和X-MLP模型中超参数的搜索范围。然后使用相关系数(R)、校准均方根(RMSEC)和预测均方根误差(RMSEP)来评估模型的性能。实验结果表明,XGBoost预处理方法具有更高的准确性和泛化能力,对于LR和MLP模型,其性能均优于其他预处理方法。