Liu Shuyuan, Hu Yang
College of Arts and Sciences, Northeast Agricultural University, Harbin, China.
Sci Rep. 2025 Jun 6;15(1):20014. doi: 10.1038/s41598-025-03780-4.
This study presents an innovative air quality prediction framework that integrates factor analysis with deep learning models for precise prediction of original variables. Using data from Beijing's Tiantan station, factor analysis was applied to reduce dimensionality. We embed the factor score matrix into the Transformer model which leveraged self-attention to capture long-term dependencies, marking a significant advancement over traditional LSTM methods. Our hybrid framework outperforms these methods and surpasses models like Transformer, N-BEATS, and Informer combined with principal component and factor analysis. Residual analysis and [Formula: see text] evaluation confirmed superior accuracy and stability, with the maximum likelihood factor analysis Transformer model achieving an MSE of 0.1619 and [Formula: see text] of 0.8520 for factor 1, and an MSE of 0.0476 and [Formula: see text] of 0.9563 for factor 2. Additionally, we introduced a cutting-edge CNN-BILSTM-ATTENTION model with discrete wavelet transform, which optimizes predictive performance by extracting local features, capturing temporal dependencies, and enhancing key time steps. Its MSE was 0.0405, with [Formula: see text] values all above 0.94, demonstrating exceptional performance. This study emphasizes the groundbreaking integration of factor analysis with deep learning, transforming causal relationships into conditions for predictive models. Future plans include optimizing factor extraction, exploring external data sources, and developing more efficient deep learning architectures.
本研究提出了一种创新的空气质量预测框架,该框架将因子分析与深度学习模型相结合,以精确预测原始变量。利用来自北京天坛站的数据,应用因子分析进行降维。我们将因子得分矩阵嵌入到Transformer模型中,该模型利用自注意力来捕捉长期依赖关系,这标志着相对于传统的长短期记忆(LSTM)方法有了显著进步。我们的混合框架优于这些方法,并且超越了像Transformer、N-BEATS和Informer与主成分分析和因子分析相结合的模型。残差分析和[公式:见原文]评估证实了其卓越的准确性和稳定性,最大似然因子分析Transformer模型在因子1上的均方误差(MSE)为0.1619,[公式:见原文]为0.8520,在因子2上的MSE为0.0476,[公式:见原文]为0.9563。此外,我们引入了一种带有离散小波变换的前沿卷积神经网络-双向长短期记忆-注意力(CNN-BILSTM-ATTENTION)模型,该模型通过提取局部特征、捕捉时间依赖关系和增强关键时间步来优化预测性能。其MSE为0.0405,[公式:见原文]值均高于0.94,表现出卓越的性能。本研究强调了因子分析与深度学习的开创性整合,将因果关系转化为预测模型的条件。未来计划包括优化因子提取、探索外部数据源以及开发更高效的深度学习架构。