Yue Xiaoxin, Bai Yulong, Yu Qinghe, Ding Lin, Song Wei, Liu Wenhui, Ren Huhu, Song Qi
College of Physics and Electrical Engineering, Northwest Normal University, Lanzhou, 730070, Gansu, China.
Sci Rep. 2025 May 16;15(1):17087. doi: 10.1038/s41598-025-00911-9.
In response to the problem of neglecting the periodic and global characteristics of sequence data when predicting PM2.5 concentrations via machine learning models, a PM2.5 concentrations prediction model based on feature space reconstruction and multihead self-attention gated recurrent unit (FSR-MSAGRU) is proposed in this study. First, the raw sequence data are subjected to frequency spectrum analysis to determine the period value of the PM2.5 sequence data. Subsequently, the seasonal trend decomposition procedure based on loess (STL) is employed to capture the periodicity and trend information in the PM2.5 sequence data. Then, the feature space of the PM2.5 sequence data is reconstructed using the raw PM2.5 sequence data, decomposed seasonal components, trend components, and residual components. Finally, the reconstructed feature data are input into multihead self-attention gated recurrent unit (MSAGRU) with the ability to capture global feature information to predict PM2.5 concentrations. Favorable prediction results were attained by the proposed FSR-MSAGRU model across 6 distinct experimental datasets, with a PCC exceeding 0.98 and a decrease in the prediction accuracy metric SMAPE of at least 68% compared to that of the GRU model. Comparative experimental results with 13 reference models demonstrate that the proposed model exhibits better prediction performances and stronger generalization abilities.
针对通过机器学习模型预测PM2.5浓度时忽略序列数据的周期性和全局特征这一问题,本研究提出了一种基于特征空间重构和多头自注意力门控循环单元(FSR-MSAGRU)的PM2.5浓度预测模型。首先,对原始序列数据进行频谱分析,以确定PM2.5序列数据的周期值。随后,采用基于局部加权散点平滑法(STL)的季节性趋势分解程序来捕捉PM2.5序列数据中的周期性和趋势信息。然后,利用原始PM2.5序列数据、分解后的季节性成分、趋势成分和残差成分来重构PM2.5序列数据的特征空间。最后,将重构后的特征数据输入到具有捕捉全局特征信息能力的多头自注意力门控循环单元(MSAGRU)中,以预测PM2.5浓度。所提出的FSR-MSAGRU模型在6个不同的实验数据集上均取得了良好的预测结果,皮尔逊相关系数(PCC)超过0.98,与门控循环单元(GRU)模型相比,预测准确率指标对称平均绝对百分比误差(SMAPE)至少降低了68%。与13个参考模型的对比实验结果表明,所提出的模型具有更好的预测性能和更强的泛化能力。