Fatima Narmeen, Yousafzai Samia Nawaz, Nemri Nadhem, Alsolai Hadeel, Ebad Shouki A, Sorour Shaymaa, Gu Yeonghyeon, Syafrudin Muhammad, Fitriyani Norma Latif
Applied INTelligence Lab (AINTLab), Seoul, 05006, Republic of Korea.
College of Earth and Environmental Sciences, University of the Punjab, Lahore, 54000, Pakistan.
Sci Rep. 2025 Aug 9;15(1):29157. doi: 10.1038/s41598-025-12422-8.
Air pollution has become a pressing global concern, demanding accurate forecasting systems to safeguard public health. Existing AQI prediction models often falter due to missing data, high variability, and limited ability to handle distributional uncertainty. This study introduces a novel deep learning framework that integrates Kalman Attention with a Bi-Directional Gated Recurrent Unit (Bi-GRU) for robust AQI time-series forecasting. Unlike conventional attention mechanisms, Kalman Attention dynamically adjusts to data uncertainty, enhancing temporal feature weighting. Additionally, we incorporate a Chi-square Divergence-based regularization term into the loss function to explicitly minimize the distributional mismatch between predicted and actual pollutant levels-a contribution not explored in prior AQI models. Missing values are imputed using a pollutant-specific ARIMA model to preserve time-dependent trends. The proposed system is evaluated using real-world data from the U.S. Environmental Protection Agency (2022-2024) across six major pollutants (CO, NO[Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text]) in the Denver-Aurora-Lakewood region. Experimental results demonstrate significant improvements over baseline models (LSTM, CNN-LSTM), achieving an [Formula: see text] of 0.96794, MSE of 4.11×10[Formula: see text], and MAE of 0.000423. This work advances AQI forecasting by addressing uncertainty, distributional alignment, and missing data within a unified architecture, providing a scalable solution for environmental monitoring and policy support.
空气污染已成为全球紧迫的关注点,需要精确的预测系统来保障公众健康。现有的空气质量指数(AQI)预测模型常常因数据缺失、高变异性以及处理分布不确定性的能力有限而表现不佳。本研究引入了一种新颖的深度学习框架,该框架将卡尔曼注意力机制与双向门控循环单元(Bi-GRU)相结合,用于稳健的AQI时间序列预测。与传统注意力机制不同,卡尔曼注意力机制能动态适应数据不确定性,增强时间特征加权。此外,我们在损失函数中纳入了基于卡方散度的正则化项,以明确最小化预测污染物水平与实际污染物水平之间的分布不匹配——这是先前AQI模型未探讨过的贡献。使用特定污染物的自回归整合移动平均(ARIMA)模型对缺失值进行插补,以保留时间相关趋势。所提出的系统使用美国环境保护局(2022 - 2024年)在丹佛 - 奥罗拉 - 莱克伍德地区六种主要污染物(一氧化碳、二氧化氮、臭氧、二氧化硫、颗粒物、铅)的真实数据进行评估。实验结果表明,与基线模型(长短期记忆网络、卷积神经网络 - 长短期记忆网络)相比有显著改进,达到了0.96794的决定系数、4.11×10的均方误差以及0.000423的平均绝对误差。这项工作通过在统一架构中解决不确定性、分布对齐和数据缺失问题,推进了AQI预测,为环境监测和政策支持提供了一种可扩展的解决方案。