Poelzl Michael, Kern Roman, Kecorius Simonas, Lovrić Mario
Institute of Interactive Systems and Data Science, Graz University of Technology, 8010, Graz, Austria.
Know Center Research GmbH, Sandgasse 34, 8010, Graz, Austria.
Sci Rep. 2025 Jan 23;15(1):2919. doi: 10.1038/s41598-025-86550-6.
Modelling of pollutants provides valuable insights into air quality dynamics, aiding exposure assessment where direct measurements are not viable. Machine learning (ML) models can be employed to explore such dynamics, including the prediction of air pollution concentrations, yet demanding extensive training data. To address this, techniques like transfer learning (TL) leverage knowledge from a model trained on a rich dataset to enhance one trained on a sparse dataset, provided there are similarities in data distribution. In our experimental setup, we utilize meteorological and pollutant data from multiple governmental air quality measurement stations in Graz, Austria, supplemented by data from one station in Zagreb, Croatia to simulate data scarcity. Common ML models such as Random Forests, Multilayer Perceptrons, Long-Short-Term Memory, and Convolutional Neural Networks are explored to predict particulate matter in both cities. Our detailed analysis of PM suggests that similarities between the cities and the meteorological features exist and can be further exploited. Hence, TL appears to offer a viable approach to enhance PM predictions for the Zagreb station, despite the challenges posed by data scarcity. Our results demonstrate the feasibility of different TL techniques to improve particulate matter prediction on transferring a ML model trained from all stations of Graz and transferred to Zagreb. Through our investigation, we discovered that selectively choosing time spans based on seasonal patterns not only aids in reducing the amount of data needed for successful TL but also significantly improves prediction performance. Specifically, training a Random Forest model using data from all measurement stations in Graz and transferring it with only 20% of the labelled data from Zagreb resulted in a 22% enhancement compared to directly testing the trained model on Zagreb.
污染物建模为空气质量动态提供了有价值的见解,有助于在无法进行直接测量的情况下进行暴露评估。机器学习(ML)模型可用于探索此类动态,包括预测空气污染浓度,但需要大量的训练数据。为了解决这个问题,迁移学习(TL)等技术利用在丰富数据集上训练的模型的知识来增强在稀疏数据集上训练的模型,前提是数据分布存在相似性。在我们的实验设置中,我们利用了奥地利格拉茨多个政府空气质量测量站的气象和污染物数据,并辅以克罗地亚萨格勒布一个站点的数据来模拟数据稀缺情况。我们探索了随机森林、多层感知器、长短期记忆和卷积神经网络等常见的ML模型来预测两个城市的颗粒物。我们对颗粒物的详细分析表明,两个城市之间存在相似性以及气象特征,并且可以进一步加以利用。因此,尽管存在数据稀缺带来的挑战,但迁移学习似乎为增强萨格勒布站点的颗粒物预测提供了一种可行的方法。我们的结果证明了不同迁移学习技术在将从格拉茨所有站点训练的ML模型转移到萨格勒布以改进颗粒物预测方面的可行性。通过我们的调查,我们发现基于季节模式有选择地选择时间跨度不仅有助于减少成功进行迁移学习所需的数据量,而且还能显著提高预测性能。具体而言,使用格拉茨所有测量站的数据训练一个随机森林模型,并仅使用萨格勒布20%的标记数据进行转移,与直接在萨格勒布测试训练好的模型相比,预测性能提高了22%。