Cui Jiaheng, Chen Xianyan, Zhao Yiping
School of Electrical and Computer Engineering, College of Engineering, The University of Georgia, Athens, Georgia 30602, United States.
Department of Epidemiology & Biostatistics, College of Public Health, The University of Georgia, Athens, Georgia 30602, United States.
Anal Chem. 2025 Aug 5;97(30):16211-16218. doi: 10.1021/acs.analchem.5c01253. Epub 2025 Jul 26.
Baseline correction is a critical preprocessing step in Raman and surface-enhanced Raman spectroscopy analysis. The adaptive iterative reweighted penalized least-squares (airPLS) method is widely used due to its simplicity and efficiency, but its effectiveness is often hindered by challenges such as baseline smoothness, parameter sensitivity, and inconsistent performance under complex spectral conditions. To address these limitations, we developed an optimized airPLS algorithm (OP-airPLS) that systematically fine-tunes key parameters by using an adaptive grid search method. We further implemented a machine learning model to predict these parameters through spectral shape recognition. A data set of 6000 simulated spectra representing 12 spectral shapes (comprising three peak types and four baseline variations) was used for evaluation. On average, OP-airPLS achieved a percentage improvement (PI) of 96 ± 2%, with the maximum improvement reducing the mean absolute error (MAE) from 0.103 to 5.55 × 10 (PI = 99.46 ± 0.06%) and the minimum improvement lowering the MAE from 0.061 to 5.68 × 10 (PI = 91 ± 7%). The optimal parameters for each spectral shape were found to reside within a well-defined linear region in the parameter space. While OP-airPLS significantly improved enhanced baseline correction accuracy, it required substantial computational resources and relied on known true baselines. To overcome these constraints, a machine learning approach combining principal component analysis and random forest (PCA-RF) was developed to directly predict optimal parameters from input spectra. The PCA-RF model demonstrated robust performance and achieved an overall PI of 90 ± 10% while requiring only 0.038 s to process each spectrum. When this method is applied to real spectra, its baseline estimation performance is constrained by both the signal-to-noise ratio and the similarity of the spectral shape to the training data.
基线校正在拉曼光谱和表面增强拉曼光谱分析中是一个关键的预处理步骤。自适应迭代重加权惩罚最小二乘法(airPLS)因其简单性和高效性而被广泛使用,但其有效性常常受到诸如基线平滑度、参数敏感性以及在复杂光谱条件下性能不一致等挑战的阻碍。为了解决这些限制,我们开发了一种优化的airPLS算法(OP-airPLS),该算法通过使用自适应网格搜索方法系统地微调关键参数。我们进一步实现了一个机器学习模型,通过光谱形状识别来预测这些参数。使用一个包含6000个模拟光谱的数据集进行评估,这些光谱代表12种光谱形状(包括三种峰型和四种基线变化)。平均而言,OP-airPLS实现了96±2%的百分比改进(PI),最大改进将平均绝对误差(MAE)从0.103降低到5.55×10(PI = 99.46±0.06%),最小改进将MAE从0.061降低到5.68×10(PI = 91±7%)。发现每种光谱形状的最佳参数位于参数空间中一个定义明确的线性区域内。虽然OP-airPLS显著提高了增强基线校正的准确性,但它需要大量的计算资源,并且依赖于已知的真实基线。为了克服这些限制,开发了一种结合主成分分析和随机森林的机器学习方法(PCA-RF),以直接从输入光谱预测最佳参数。PCA-RF模型表现出稳健的性能,实现了90±10%的总体PI,同时处理每个光谱仅需0.038秒。当将此方法应用于真实光谱时,其基线估计性能受到信噪比以及光谱形状与训练数据相似性的限制。