基于 BO-XGBoost-RFE 算法的全球对流层臭氧预测特征选择。

Feature selection for global tropospheric ozone prediction based on the BO-XGBoost-RFE algorithm.

机构信息

School of Computer Science, Liaocheng University, Liaocheng, 252000, China.

School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China.

出版信息

Sci Rep. 2022 Jun 2;12(1):9244. doi: 10.1038/s41598-022-13498-2.

Abstract

Ozone is one of the most important air pollutants, with significant impacts on human health, regional air quality and ecosystems. In this study, we use geographic information and environmental information of the monitoring site of 5577 regions in the world from 2010 to 2014 as feature input to predict the long-term average ozone concentration of the site. A Bayesian optimization-based XGBoost-RFE feature selection model BO-XGBoost-RFE is proposed, and a variety of machine learning algorithms are used to predict ozone concentration based on the optimal feature subset. Since the selection of the underlying model hyperparameters is involved in the recursive feature selection process, different hyperparameter combinations will lead to differences in the feature subsets selected by the model, so that the feature subsets obtained by the model may not be optimal solutions. We combine the Bayesian optimization algorithm to adjust the parameters of recursive feature elimination based on XGBoost to obtain the optimal parameter combination and the optimal feature subset under the parameter combination. Experiments on long-term ozone concentration prediction on a global scale show that the prediction accuracy of the model after Bayesian optimized XGBoost-RFE feature selection is higher than that based on all features and on feature selection with Pearson correlation. Among the four prediction models, random forest obtained the highest prediction accuracy. The XGBoost prediction model achieved the greatest improvement in accuracy.

摘要

臭氧是最重要的空气污染物之一,对人类健康、区域空气质量和生态系统都有重大影响。在本研究中,我们使用了 2010 年至 2014 年全球 5577 个地区监测站点的地理信息和环境信息作为特征输入,以预测该站点的长期平均臭氧浓度。提出了一种基于贝叶斯优化的 XGBoost-RFE 特征选择模型 BO-XGBoost-RFE,并使用多种机器学习算法基于最优特征子集来预测臭氧浓度。由于递归特征消除过程中涉及基础模型超参数的选择,不同的超参数组合会导致模型选择的特征子集不同,从而使模型获得的特征子集可能不是最优解。我们结合贝叶斯优化算法,基于 XGBoost 调整递归特征消除的参数,以在参数组合下获得最优参数组合和最优特征子集。在全球范围内对长期臭氧浓度进行预测的实验表明,经过贝叶斯优化的 XGBoost-RFE 特征选择后的模型的预测精度高于基于所有特征和基于 Pearson 相关性的特征选择的模型。在这四个预测模型中,随机森林获得了最高的预测精度。XGBoost 预测模型在准确性方面取得了最大的提高。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索