Iwaszenko Sebastian, Smolinski Adam, Grzanka Marcin, Skowronek Tomasz
Central Mining Institute - National Research Institute, Plac Gwarkow 1, 40-166, Poland, Katowice.
eGminy Sp. z o.o., Cieszyńska 365, 43-300, Bielsko Biała, Poland.
Sci Rep. 2024 Aug 16;14(1):18999. doi: 10.1038/s41598-024-70152-9.
Air quality is a fundamental component of a healthy environment for human beings. Monitoring networks for air pollution have been established in numerous industrial zones. The data collected by the pervasive monitoring devices can be utilized not only for determining the current environmental condition, but also for forecasting it in the near future. This paper considers the applications of different machine learning methods for the prediction of the two most widely used quantities. Particulate matter (PM) with a diameter of 2.5 and 10 µm, respectively. The data are collected via a proprietary monitoring station, designated as the Ecolumn. The Ecolumn monitors a number of key parameters, including temperature, pressure, humidity, PM 1.0, PM 2.5, and PM 10, in a timely manner. The data were employed in the development of multiple models based on selected machine learning methods. The decision tree, random forest, recurrent neural network, and long short-term memory models were employed. Experiments were conducted with varying hyperparameters and network architectures. Different time scales (10 min, 1 h, and 24 h) were examined. The most optimal results were observed for the Long Short-Term Memory algorithm when utilizing the shortest available time spans (shortest averaging times). The decision tree and random forest algorithms demonstrated unexpectedly high performance for long averaging times, exhibiting only a slight decline in accuracy compared to neural networks for shorter averaging times. Recommendations for the potential applicability of the tested methods were formulated.
空气质量是人类健康环境的基本组成部分。许多工业区都建立了空气污染监测网络。通过普及的监测设备收集的数据不仅可用于确定当前的环境状况,还可用于预测近期的环境状况。本文考虑了不同机器学习方法在预测两种最广泛使用的污染物数量方面的应用。分别是直径为2.5微米和10微米的颗粒物(PM)。数据是通过一个名为Ecolumn的专有监测站收集的。Ecolumn及时监测多个关键参数,包括温度、压力、湿度、PM 1.0、PM 2.5和PM 10。这些数据被用于基于选定机器学习方法开发多个模型。使用了决策树、随机森林、递归神经网络和长短期记忆模型。对不同的超参数和网络架构进行了实验。研究了不同的时间尺度(10分钟、1小时和24小时)。在使用最短可用时间跨度(最短平均时间)时,长短期记忆算法获得了最优结果。决策树和随机森林算法在长平均时间下表现出出乎意料的高性能,与神经网络相比,在较短平均时间下精度仅略有下降。针对所测试方法的潜在适用性提出了建议。