Räss Stephan, Leuenberger Markus C
Climate and Environmental Physics, Physics Institute, University of Bern, Bern, Switzerland.
Oeschger Centre for Climate Change Research, University of Bern, Bern, Switzerland.
Front Big Data. 2025 Jan 15;7:1469809. doi: 10.3389/fdata.2024.1469809. eCollection 2024.
Atmospheric ozone chemistry involves various substances and reactions, which makes it a complex system. We analyzed data recorded by Switzerland's National Air Pollution Monitoring Network (NABEL) to showcase the capabilities of machine learning (ML) for the prediction of ozone concentrations (daily averages) and to document a general approach that can be followed by anyone facing similar problems. We evaluated various artificial neural networks and compared them to linear as well as non-linear models deduced with ML. The main analyses and the training of the models were performed on atmospheric air data recorded from 2016 to 2023 at the NABEL station Lugano-Università in Lugano, TI, Switzerland. As a first step, we used techniques like best subset selection to determine the measurement parameters that might be relevant for the prediction of ozone concentrations; in general, the parameters identified by these methods agree with atmospheric ozone chemistry. Based on these results, we constructed various models and used them to predict ozone concentrations in Lugano for the period between January 1, 2024, and March 31, 2024; then, we compared the output of our models to the actual measurements and repeated this procedure for two NABEL stations situated in northern Switzerland (Dübendorf-Empa and Zürich-Kaserne). For these stations, predictions were made for the aforementioned period and the period between January 1, 2023, and December 31, 2023. In most of the cases, the lowest mean absolute errors (MAE) were provided by a non-linear model with 12 components (different powers and linear combinations of NO, NO, SO, non-methane volatile organic compounds, temperature and radiation); the MAE of predicted ozone concentrations in Lugano was as low as 9 μgm. For the stations in Zürich and Dübendorf, the lowest MAEs were around 11 μgm and 13 μgm, respectively. For the tested periods, the accuracy of the best models was approximately 1 μgm. Since the aforementioned values are all lower than the standard deviations of the observations we conclude that using ML for complex data analyses can be very helpful and that artificial neural networks do not necessarily outperform simpler models.
大气臭氧化学涉及多种物质和反应,这使其成为一个复杂的系统。我们分析了瑞士国家空气污染监测网络(NABEL)记录的数据,以展示机器学习(ML)在预测臭氧浓度(日平均值)方面的能力,并记录一种可供任何面临类似问题的人遵循的通用方法。我们评估了各种人工神经网络,并将它们与通过ML推导的线性和非线性模型进行比较。主要分析和模型训练是基于2016年至2023年在瑞士提契诺州卢加诺的NABEL站卢加诺大学记录的大气数据进行的。第一步,我们使用最佳子集选择等技术来确定可能与臭氧浓度预测相关的测量参数;一般来说,这些方法确定的参数与大气臭氧化学相符。基于这些结果,我们构建了各种模型,并使用它们来预测2024年1月1日至2024年3月31日期间卢加诺的臭氧浓度;然后,我们将模型的输出与实际测量值进行比较,并对位于瑞士北部的两个NABEL站(迪本多夫 - 恩帕和苏黎世 - 卡泽尔内)重复此过程。对于这些站点,对上述时间段以及2023年1月1日至2023年12月31日期间进行了预测。在大多数情况下,具有12个分量(NO、NO、SO、非甲烷挥发性有机化合物、温度和辐射的不同幂次和线性组合)的非线性模型提供了最低的平均绝对误差(MAE);卢加诺预测臭氧浓度的MAE低至9μg/m³。对于苏黎世和迪本多夫的站点,最低MAE分别约为11μg/m³和13μg/m³。对于测试时间段,最佳模型的精度约为1μg/m³。由于上述值均低于观测值的标准差,我们得出结论,使用ML进行复杂数据分析非常有帮助,并且人工神经网络不一定优于更简单的模型。