Department of Safety, Health and Environmental Engineering, Ming Chi University of Technology, Taiwan; Center for Environmental Sustainability and Human Health, Ming Chi University of Technology, Taiwan.
Department of Geomatics, National Cheng Kung University, Tainan, Taiwan.
Chemosphere. 2022 Aug;301:134758. doi: 10.1016/j.chemosphere.2022.134758. Epub 2022 Apr 28.
It is well known benzene negatively impacts human health. This study is the first to predict spatial-temporal variations in benzene concentrations for the entirety of Taiwan by using a mixed spatial prediction model integrating multiple machine learning algorithms and predictor variables selected by Land-use Regression (LUR). Monthly benzene concentrations from 2003 to 2019 were utilized for model development, and monthly benzene concentration data from 2020, as well as mobile monitoring vehicle data from 2009 to 2019, served as external data for verifying model reliability. Benzene concentrations were estimated by running six LUR-based machine learning algorithms; these algorithms, which include random forest (RF), deep neural network (DNN), gradient boosting (GBoost), light gradient boosting (LightGBM), CatBoost, extreme gradient boosting (XGBoost), and ensemble algorithms (a combination of the three best performing models), can capture how nonlinear observations and predictions are related. The results indicated conventional LUR captured 79% of the variability in benzene concentrations. Notably, the LUR with ensemble algorithm (GBoost, CatBoost, and XGBoost) surpassed all other integrated methods, increasing the explanatory power to 92%. This study establishes the value of the proposed ensemble-based model for estimating spatiotemporal variation in benzene exposure.
众所周知,苯会对人体健康造成负面影响。本研究首次利用整合了多种机器学习算法和由土地利用回归(LUR)选择的预测变量的混合空间预测模型,预测台湾全省苯浓度的时空变化。使用 2003 年至 2019 年的每月苯浓度数据进行模型开发,并利用 2020 年的每月苯浓度数据以及 2009 年至 2019 年的移动监测车数据作为验证模型可靠性的外部数据。通过运行六种基于 LUR 的机器学习算法来估计苯浓度;这些算法包括随机森林(RF)、深度神经网络(DNN)、梯度提升(GBoost)、轻梯度提升(LightGBM)、CatBoost、极端梯度提升(XGBoost)和集成算法(三种表现最好的模型的组合),可以捕捉非线性观测值和预测值之间的关系。结果表明,传统的 LUR 捕捉到了 79%的苯浓度变化。值得注意的是,基于集成算法(GBoost、CatBoost 和 XGBoost)的 LUR 超过了所有其他集成方法,解释能力提高到 92%。本研究确立了所提出的基于集成模型的估计苯暴露时空变化的价值。