特征选择方法在地下水硝酸盐污染预测模型中的应用：筛选法、嵌入法和包裹法的评估。

Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods.

机构信息

Physical Geography and Regional Geographic Analysis, University of Seville, Seville 41004, Spain; Geography and Environment, School of Geography, University of Southampton, Southampton SO17 1BJ, United Kingdom.

Unidad del IGME en Granada, Urbanización Alcazar del Genil, 4, 18006 Granada, Spain.

出版信息

Sci Total Environ. 2018 May 15;624:661-672. doi: 10.1016/j.scitotenv.2017.12.152. Epub 2017 Dec 27.

DOI:10.1016/j.scitotenv.2017.12.152

PMID:29272835

Abstract

Recognising the various sources of nitrate pollution and understanding system dynamics are fundamental to tackle groundwater quality problems. A comprehensive GIS database of twenty parameters regarding hydrogeological and hydrological features and driving forces were used as inputs for predictive models of nitrate pollution. Additionally, key variables extracted from remotely sensed Normalised Difference Vegetation Index time-series (NDVI) were included in database to provide indications of agroecosystem dynamics. Many approaches can be used to evaluate feature importance related to groundwater pollution caused by nitrates. Filters, wrappers and embedded methods are used to rank feature importance according to the probability of occurrence of nitrates above a threshold value in groundwater. Machine learning algorithms (MLA) such as Classification and Regression Trees (CART), Random Forest (RF) and Support Vector Machines (SVM) are used as wrappers considering four different sequential search approaches: the sequential backward selection (SBS), the sequential forward selection (SFS), the sequential forward floating selection (SFFS) and sequential backward floating selection (SBFS). Feature importance obtained from RF and CART was used as an embedded approach. RF with SFFS had the best performance (mmce=0.12 and AUC=0.92) and good interpretability, where three features related to groundwater polluted areas were selected: i) industries and facilities rating according to their production capacity and total nitrogen emissions to water within a 3km buffer, ii) livestock farms rating by manure production within a 5km buffer and, iii) cumulated NDVI for the post-maximum month, being used as a proxy of vegetation productivity and crop yield.

摘要

认识到硝酸盐污染的各种来源并了解系统动态是解决地下水质量问题的基础。使用了一个包含二十个与水文地质和水文特征及驱动力有关的参数的综合 GIS 数据库作为硝酸盐污染预测模型的输入。此外，从遥感归一化差异植被指数时间序列（NDVI）中提取的关键变量也被包含在数据库中，以提供农业生态系统动态的指示。有许多方法可用于评估与硝酸盐引起的地下水污染有关的特征重要性。过滤器、包装器和嵌入式方法可根据地下水硝酸盐含量超过阈值的概率对特征重要性进行排序。机器学习算法（MLA），如分类和回归树（CART）、随机森林（RF）和支持向量机（SVM），可作为包装器考虑四种不同的顺序搜索方法：顺序后向选择（SBS）、顺序前向选择（SFS）、顺序前向浮动选择（SFFS）和顺序后向浮动选择（SBFS）。从 RF 和 CART 获得的特征重要性被用作嵌入式方法。RF 与 SFFS 的性能最佳（mmce=0.12 和 AUC=0.92），且具有良好的可解释性，其中选择了三个与地下水污染区域有关的特征：i）根据其生产能力和 3km 缓冲区内向水排放的总氮量对工业和设施进行评级，ii）根据 5km 缓冲区的粪便产量对牲畜养殖场进行评级，iii）累积 NDVI 用于后最大值月份，用作植被生产力和作物产量的代理。

相似文献

Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods.

Sci Total Environ. 2018 May 15;624:661-672. doi: 10.1016/j.scitotenv.2017.12.152. Epub 2017 Dec 27.

Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain).

Sci Total Environ. 2014 Apr 1;476-477:189-206. doi: 10.1016/j.scitotenv.2014.01.001. Epub 2014 Jan 24.

Machine Learning and Feature Selection for soil spectroscopy. An evaluation of Random Forest wrappers to predict soil organic matter, clay, and carbonates.

Heliyon. 2024 Apr 25;10(9):e30228. doi: 10.1016/j.heliyon.2024.e30228. eCollection 2024 May 15.

Two-stage hybrid feature selection algorithms for diagnosing erythemato-squamous diseases.

Health Inf Sci Syst. 2013 May 30;1:10. doi: 10.1186/2047-2501-1-10. eCollection 2013.

Predicting uncertainty of machine learning models for modelling nitrate pollution of groundwater using quantile regression and UNEEC methods.

Sci Total Environ. 2019 Oct 20;688:855-866. doi: 10.1016/j.scitotenv.2019.06.320. Epub 2019 Jun 21.

A machine learning framework for spatio-temporal vulnerability mapping of groundwaters to nitrate in a data scarce region in Lenjanat Plain, Iran.

Environ Sci Pollut Res Int. 2024 Jun;31(29):42088-42110. doi: 10.1007/s11356-024-33920-8. Epub 2024 Jun 11.

Large scale prediction of groundwater nitrate concentrations from spatial data using machine learning.

Sci Total Environ. 2019 Jun 10;668:1317-1327. doi: 10.1016/j.scitotenv.2019.03.045. Epub 2019 Mar 6.

A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination.

Sci Total Environ. 2018 Dec 10;644:954-962. doi: 10.1016/j.scitotenv.2018.07.054. Epub 2018 Jul 11.

A novel feature selection approach for biomedical data classification.

J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.

A random forest classifier for lymph diseases.

Comput Methods Programs Biomed. 2014 Feb;113(2):465-73. doi: 10.1016/j.cmpb.2013.11.004. Epub 2013 Nov 14.

引用本文的文献

Cervical Cancer Detection Using Deep Neural Network and Hybrid Waterwheel Plant Optimization Algorithm.

Bioengineering (Basel). 2025 Apr 30;12(5):478. doi: 10.3390/bioengineering12050478.

Enhancing DDoS detection in SDIoT through effective feature selection with SMOTE-ENN.

PLoS One. 2024 Oct 17;19(10):e0309682. doi: 10.1371/journal.pone.0309682. eCollection 2024.

Machine Learning and Feature Selection for soil spectroscopy. An evaluation of Random Forest wrappers to predict soil organic matter, clay, and carbonates.

Heliyon. 2024 Apr 25;10(9):e30228. doi: 10.1016/j.heliyon.2024.e30228. eCollection 2024 May 15.

Mapping specific groundwater nitrate concentrations from spatial data using machine learning: A case study of chongqing, China.

Heliyon. 2024 Mar 13;10(6):e27867. doi: 10.1016/j.heliyon.2024.e27867. eCollection 2024 Mar 30.

An intelligent decision support system for groundwater supply management and electromechanical infrastructure controls.

Heliyon. 2024 Jan 20;10(3):e25036. doi: 10.1016/j.heliyon.2024.e25036. eCollection 2024 Feb 15.

Estimation of Shape Error with Monitoring Signals.

Sensors (Basel). 2023 Nov 26;23(23):9416. doi: 10.3390/s23239416.

Automatic Variable Selection Algorithms in Prognostic Factor Research in Neck Pain.

J Clin Med. 2023 Sep 27;12(19):6232. doi: 10.3390/jcm12196232.

Unraveling the mechanisms underlying drug-induced cholestatic liver injury: identifying key genes using machine learning techniques on human in vitro data sets.

Arch Toxicol. 2023 Nov;97(11):2969-2981. doi: 10.1007/s00204-023-03583-4. Epub 2023 Aug 21.

Data quantity governance for machine learning in materials science.

Natl Sci Rev. 2023 May 1;10(7):nwad125. doi: 10.1093/nsr/nwad125. eCollection 2023 Jul.

ATTIC is an integrated approach for predicting A-to-I RNA editing sites in three species.

Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad170.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

特征选择方法在地下水硝酸盐污染预测模型中的应用：筛选法、嵌入法和包裹法的评估。

Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献