通过使用包裹损失和神经网络减轻年度数据漂移来增强PM2.5预测。

Enhancing PM2.5 prediction by mitigating annual data drift using wrapped loss and neural networks.

作者信息

Hossen Md Khalid, Peng Yan-Tsung, Chen Meng Chang

机构信息

Social Networks and Human-Centered Computing, Taiwan International Graduate Program, Academia Sinca, Taipei, Taiwan.

Department of Computer Science, National Chengchi University, Taipei, Taiwan.

出版信息

PLoS One. 2025 Feb 11;20(2):e0314327. doi: 10.1371/journal.pone.0314327. eCollection 2025.

DOI:10.1371/journal.pone.0314327

PMID:39932913

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11813127/

Abstract

In many deep learning tasks, it is assumed that the data used in the training process is sampled from the same distribution. However, this may not be accurate for data collected from different contexts or during different periods. For instance, the temperatures in a city can vary from year to year due to various unclear reasons. In this paper, we utilized three distinct statistical techniques to analyze annual data drifting at various stations. These techniques calculate the P values for each station by comparing data from five years (2014-2018) to identify data drifting phenomena. To find out the data drifting scenario those statistical techniques and calculate the P value from those techniques to measure the data drifting in specific locations. From those statistical techniques, the highest drifting stations can be identified from the previous year's datasets To identify data drifting and highlight areas with significant drift, we utilized meteorological air quality and weather data in this study. We proposed two models that consider the characteristics of data drifting for PM2.5 prediction and compared them with various deep learning models, such as Long Short-Term Memory (LSTM) and its variants, for predictions from the next hour to the 64th hour. Our proposed models significantly outperform traditional neural networks. Additionally, we introduced a wrapped loss function incorporated into a model, resulting in more accurate results compared to those using the original loss function alone and prediction has been evaluated by RMSE, MAE and MAPE metrics. The proposed Front-loaded connection model(FLC) and Back-loaded connection model (BLC) solve the data drifting issue and the wrap loss function also help alleviate the data drifting problem with model training and works for the neural network models to achieve more accurate results. Eventually, the experimental results have shown that the proposed model performance enhanced from 24.1% -16%, 12%-8.3% respectively at 1h-24h, 32h-64h with compared to baselines BILSTM model, by 24.6% -11.8%, 10%-10.2% respectively at 1h-24h, 32h-64h compared to CNN model in hourly PM2.5 predictions.

摘要

在许多深度学习任务中，人们假定训练过程中使用的数据是从同一分布中采样得到的。然而，对于从不同背景或不同时期收集的数据而言，这可能并不准确。例如，由于各种不明原因，一个城市的温度可能逐年变化。在本文中，我们运用了三种不同的统计技术来分析各站点的年度数据漂移情况。这些技术通过比较五年（2014 - 2018年）的数据来计算每个站点的P值，以识别数据漂移现象。为了找出数据漂移情况，那些统计技术并从这些技术中计算P值来衡量特定位置的数据漂移。从那些统计技术中，可以从前一年的数据集中识别出漂移程度最高的站点。为了识别数据漂移并突出显著漂移的区域，我们在本研究中使用了气象空气质量和天气数据。我们提出了两种考虑数据漂移特征的模型用于PM2.5预测，并将它们与各种深度学习模型（如长短期记忆网络（LSTM）及其变体）进行比较，以进行从下一小时到第64小时的预测。我们提出的模型显著优于传统神经网络。此外，我们引入了一个包含在模型中的包装损失函数，与仅使用原始损失函数相比，得到了更准确的结果，并且预测已通过均方根误差（RMSE）、平均绝对误差（MAE）和平均绝对百分比误差（MAPE）指标进行评估。所提出的前加载连接模型（FLC）和后加载连接模型（BLC）解决了数据漂移问题，并且包装损失函数也有助于在模型训练中缓解数据漂移问题，并适用于神经网络模型以获得更准确的结果。最终，实验结果表明，与基线双向长短期记忆网络（BILSTM）模型相比，所提出的模型在1小时 - 24小时、32小时 - 64小时的性能分别提高了24.1% - 16%、

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/377e/11813127/fae0fe863483/pone.0314327.g001.jpg

相似文献

Enhancing PM2.5 prediction by mitigating annual data drift using wrapped loss and neural networks.

PLoS One. 2025 Feb 11;20(2):e0314327. doi: 10.1371/journal.pone.0314327. eCollection 2025.

Hourly PM concentration prediction for dry bulk port clusters considering spatiotemporal correlation: A novel deep learning blending ensemble model.

J Environ Manage. 2024 Nov;370:122703. doi: 10.1016/j.jenvman.2024.122703. Epub 2024 Oct 1.

Prediction of PM concentration based on a CNN-LSTM neural network algorithm.

PeerJ. 2024 Aug 6;12:e17811. doi: 10.7717/peerj.17811. eCollection 2024.

PM Prediction with a Novel Multi-Step-Ahead Forecasting Model Based on Dynamic Wind Field Distance.

Int J Environ Res Public Health. 2019 Nov 14;16(22):4482. doi: 10.3390/ijerph16224482.

Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation.

Environ Pollut. 2017 Dec;231(Pt 1):997-1004. doi: 10.1016/j.envpol.2017.08.114. Epub 2017 Sep 25.

Daily PM2.5 concentration prediction based on variational modal decomposition and deep learning for multi-site temporal and spatial fusion of meteorological factors.

Environ Monit Assess. 2024 Aug 29;196(9):859. doi: 10.1007/s10661-024-13005-2.

Long short-term memory - Fully connected (LSTM-FC) neural network for PM concentration prediction.

Chemosphere. 2019 Apr;220:486-492. doi: 10.1016/j.chemosphere.2018.12.128. Epub 2018 Dec 21.

Optimized air quality management based on air quality index prediction and air pollutants identification in representative cities in China.

Sci Rep. 2024 Aug 2;14(1):17923. doi: 10.1038/s41598-024-68972-w.

Short-term prediction of PM2.5 concentration by hybrid neural network based on sequence decomposition.

PLoS One. 2024 May 10;19(5):e0299603. doi: 10.1371/journal.pone.0299603. eCollection 2024.

PM concentration prediction using a whale optimization algorithm based hybrid deep learning model in Beijing, China.

Environ Pollut. 2025 Apr 15;371:125953. doi: 10.1016/j.envpol.2025.125953. Epub 2025 Mar 1.

引用本文的文献

An ODE based neural network approach for PM2.5 forecasting.

Sci Rep. 2025 Jul 10;15(1):24830. doi: 10.1038/s41598-025-05958-2.

本文引用的文献

Using satellite data on remote transportation of air pollutants for PM2.5 prediction in northern Taiwan.

PLoS One. 2023 Mar 10;18(3):e0282471. doi: 10.1371/journal.pone.0282471. eCollection 2023.

Source sector and fuel contributions to ambient PM and attributable mortality across multiple spatial scales.

Nat Commun. 2021 Jun 14;12(1):3594. doi: 10.1038/s41467-021-23853-y.

Attention-based parallel networks (APNet) for PM spatiotemporal prediction.

Sci Total Environ. 2021 May 15;769:145082. doi: 10.1016/j.scitotenv.2021.145082. Epub 2021 Jan 12.

Deep Learning for Prediction of the Air Quality Response to Emission Changes.

Environ Sci Technol. 2020 Jul 21;54(14):8589-8600. doi: 10.1021/acs.est.0c02923. Epub 2020 Jul 1.

Learning Disentangled Semantic Representation for Domain Adaptation.

IJCAI (U S). 2019 Aug;2019:2060-2066.

A novel spatiotemporal convolutional long short-term neural network for air pollution prediction.

Sci Total Environ. 2019 Mar 1;654:1091-1099. doi: 10.1016/j.scitotenv.2018.11.086. Epub 2018 Nov 9.

Trends on PM research, 1997-2016: a bibliometric study.

Environ Sci Pollut Res Int. 2018 May;25(13):12284-12298. doi: 10.1007/s11356-018-1723-x. Epub 2018 Apr 5.

What the values really tell us.

Korean J Pain. 2017 Oct;30(4):241-242. doi: 10.3344/kjp.2017.30.4.241. Epub 2017 Sep 29.

Understanding the Role of P Values and Hypothesis Tests in Clinical Research.

JAMA Cardiol. 2016 Dec 1;1(9):1048-1054. doi: 10.1001/jamacardio.2016.3312.

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.

Eur J Epidemiol. 2016 Apr;31(4):337-50. doi: 10.1007/s10654-016-0149-3. Epub 2016 May 21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过使用包裹损失和神经网络减轻年度数据漂移来增强PM2.5预测。

Enhancing PM2.5 prediction by mitigating annual data drift using wrapped loss and neural networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献