Fan Kai, Dhammapala Ranil, Harrington Kyle, Lamastro Ryan, Lamb Brian, Lee Yunha
Center for Advanced Systems Understanding, Görlitz, Germany.
Helmholtz-Zentrum Dresden Rossendorf, Dresden, Germany.
Front Big Data. 2022 Feb 10;5:781309. doi: 10.3389/fdata.2022.781309. eCollection 2022.
Chemical transport models (CTMs) are widely used for air quality forecasts, but these models require large computational resources and often suffer from a systematic bias that leads to missed poor air pollution events. For example, a CTM-based operational forecasting system for air quality over the Pacific Northwest, called AIRPACT, uses over 100 processors for several hours to provide 48-h forecasts daily, but struggles to capture unhealthy O episodes during the summer and early fall, especially over Kennewick, WA. This research developed machine learning (ML) based O forecasts for Kennewick, WA to demonstrate an improved forecast capability. We used the 2017-2020 simulated meteorology and O observation data from Kennewick as training datasets. The meteorology datasets are from the Weather Research and Forecasting (WRF) meteorological model forecasts produced daily by the University of Washington. Our ozone forecasting system consists of two ML models, ML1 and ML2, to improve predictability: ML1 uses the random forest (RF) classifier and multiple linear regression (MLR) models, and ML2 uses a two-phase RF regression model with best-fit weighting factors. To avoid overfitting, we evaluate the ML forecasting system with the 10-time, 10-fold, and walk-forward cross-validation analysis. Compared to AIRPACT, ML1 improved forecast skill for high-O events and captured 5 out of 10 unhealthy O events, while AIRPACT and ML2 missed all the unhealthy events. ML2 showed better forecast skill for less elevated-O events. Based on this result, we set up our ML modeling framework to use ML1 for high-O events and ML2 for less elevated O events. Since May 2019, the ML modeling framework has been used to produce daily 72-h O forecasts and has provided forecasts via the web for clean air agency and public use: http://ozonematters.com/. Compared to the testing period, the operational forecasting period has not had unhealthy O events. Nevertheless, the ML modeling framework demonstrated a reliable forecasting capability at a selected location with much less computational resources. The ML system uses a single processor for minutes compared to the CTM-based forecasting system using more than 100 processors for hours.
化学传输模型(CTMs)被广泛用于空气质量预测,但这些模型需要大量计算资源,且常常存在系统偏差,导致错过空气污染严重的事件。例如,一个基于CTM的太平洋西北地区空气质量业务预报系统,名为AIRPACT,每天使用100多个处理器运行数小时来提供48小时预报,但在夏季和初秋期间难以捕捉到不健康的臭氧事件,尤其是在华盛顿州的肯纳威克地区。本研究针对华盛顿州的肯纳威克地区开发了基于机器学习(ML)的臭氧预报,以展示改进的预报能力。我们使用了2017 - 2020年肯纳威克地区的模拟气象数据和臭氧观测数据作为训练数据集。气象数据集来自华盛顿大学每日生成的天气研究与预报(WRF)气象模型预报。我们的臭氧预报系统由两个ML模型ML1和ML2组成,以提高可预测性:ML1使用随机森林(RF)分类器和多元线性回归(MLR)模型,ML2使用带有最佳拟合加权因子的两阶段RF回归模型。为避免过拟合,我们采用10次、10折和向前验证交叉验证分析来评估ML预报系统。与AIRPACT相比,ML1提高了对高臭氧事件的预报技巧,捕捉到了10次不健康臭氧事件中的5次,而AIRPACT和ML2则错过了所有不健康事件。ML2对臭氧浓度较低事件的预报技巧更好。基于这一结果,我们建立了ML建模框架,对于高臭氧事件使用ML1,对于臭氧浓度较低事件使用ML2。自2019年5月以来,ML建模框架已用于生成每日72小时的臭氧预报,并通过网络为空气清洁机构和公众提供预报:http://ozonematters.com/。与测试期相比,业务预报期没有出现不健康的臭氧事件。尽管如此,ML建模框架在选定地点展示了可靠的预报能力,且计算资源要少得多。ML系统使用单个处理器只需几分钟,而基于CTM的预报系统则需要100多个处理器运行数小时。