Agbehadji Israel Edem, Obagbuwa Ibidun Christiana
Centre for Global Change, Sol Plaatje University, Kimberley, South Africa.
Department of Computer Science and Information Technology, Faculty of Natural and Applied Sciences, Sol Plaatje University, Kimberley, South Africa.
Front Artif Intell. 2025 Aug 4;8:1620019. doi: 10.3389/frai.2025.1620019. eCollection 2025.
The study addresses the problem of nonlinear characteristics of common air pollutants by proposing a deep learning time-series model based on the long short-term memory (LSTM) integrated with a generalized additive model (GAM). LSTM model captures both nonlinear relationships and temporal long-term dependencies in time-series data, and GAM provides insight into the statistical relationship between selected features and the target pollutant. The post-hoc eXplainable artificial intelligence (xAI) technique, local interpretable model-agnostic explanation (LIME), further explains the nonlinearity. Finally, causal inference was determined on the impact of the air pollutants relationship, thereby offering further interpretability in which deep learning models are deficient. Meteorological and air pollutant statistical records were leveraged from a Hantam (Karoo) air monitoring station in South Africa, and through a random sampling approach, synthetic data were generated for the city of Kimberley. The model was evaluated with the mean squared error (MSE), root mean squared error (RMSE) and mean absolute error (MAE) for different time-steps. The proposed referred to as long short-term memory generalized additive model based post-hoc eXplainable Artificial Intelligence (LSTM-GAM_xAI) model with a 10-day time-step and 5-day time-step for multiple pollutants prediction guaranteed least MSE. Though the causal effect analysis show no -values (>0.88) for variables, the experiment results show that LSTM-GAM-xAI guaranteed the lowest MSE values across different time-steps.
本研究通过提出一种基于长短期记忆(LSTM)并结合广义相加模型(GAM)的深度学习时间序列模型,来解决常见空气污染物的非线性特征问题。LSTM模型捕捉时间序列数据中的非线性关系和时间长期依赖性,而GAM则深入了解所选特征与目标污染物之间的统计关系。事后可解释人工智能(xAI)技术,即局部可解释模型无关解释(LIME),进一步解释了这种非线性。最后,确定了空气污染物关系的因果推断,从而在深度学习模型存在缺陷的方面提供了进一步的可解释性。利用了南非汉坦姆(卡鲁)空气监测站的气象和空气污染物统计记录,并通过随机抽样方法为金伯利市生成了合成数据。使用不同时间步长的均方误差(MSE)、均方根误差(RMSE)和平均绝对误差(MAE)对模型进行了评估。所提出的称为基于长短期记忆广义相加模型的事后可解释人工智能(LSTM-GAM_xAI)模型,在多污染物预测的10天时间步长和5天时间步长下保证了最小的MSE。尽管因果效应分析显示变量的p值不显著(>0.88),但实验结果表明,LSTM-GAM-xAI在不同时间步长下保证了最低的MSE值。