Suppr超能文献

特征选择方法在地下水硝酸盐污染预测模型中的应用:筛选法、嵌入法和包裹法的评估。

Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods.

机构信息

Physical Geography and Regional Geographic Analysis, University of Seville, Seville 41004, Spain; Geography and Environment, School of Geography, University of Southampton, Southampton SO17 1BJ, United Kingdom.

Unidad del IGME en Granada, Urbanización Alcazar del Genil, 4, 18006 Granada, Spain.

出版信息

Sci Total Environ. 2018 May 15;624:661-672. doi: 10.1016/j.scitotenv.2017.12.152. Epub 2017 Dec 27.

Abstract

Recognising the various sources of nitrate pollution and understanding system dynamics are fundamental to tackle groundwater quality problems. A comprehensive GIS database of twenty parameters regarding hydrogeological and hydrological features and driving forces were used as inputs for predictive models of nitrate pollution. Additionally, key variables extracted from remotely sensed Normalised Difference Vegetation Index time-series (NDVI) were included in database to provide indications of agroecosystem dynamics. Many approaches can be used to evaluate feature importance related to groundwater pollution caused by nitrates. Filters, wrappers and embedded methods are used to rank feature importance according to the probability of occurrence of nitrates above a threshold value in groundwater. Machine learning algorithms (MLA) such as Classification and Regression Trees (CART), Random Forest (RF) and Support Vector Machines (SVM) are used as wrappers considering four different sequential search approaches: the sequential backward selection (SBS), the sequential forward selection (SFS), the sequential forward floating selection (SFFS) and sequential backward floating selection (SBFS). Feature importance obtained from RF and CART was used as an embedded approach. RF with SFFS had the best performance (mmce=0.12 and AUC=0.92) and good interpretability, where three features related to groundwater polluted areas were selected: i) industries and facilities rating according to their production capacity and total nitrogen emissions to water within a 3km buffer, ii) livestock farms rating by manure production within a 5km buffer and, iii) cumulated NDVI for the post-maximum month, being used as a proxy of vegetation productivity and crop yield.

摘要

认识到硝酸盐污染的各种来源并了解系统动态是解决地下水质量问题的基础。使用了一个包含二十个与水文地质和水文特征及驱动力有关的参数的综合 GIS 数据库作为硝酸盐污染预测模型的输入。此外,从遥感归一化差异植被指数时间序列(NDVI)中提取的关键变量也被包含在数据库中,以提供农业生态系统动态的指示。有许多方法可用于评估与硝酸盐引起的地下水污染有关的特征重要性。过滤器、包装器和嵌入式方法可根据地下水硝酸盐含量超过阈值的概率对特征重要性进行排序。机器学习算法(MLA),如分类和回归树(CART)、随机森林(RF)和支持向量机(SVM),可作为包装器考虑四种不同的顺序搜索方法:顺序后向选择(SBS)、顺序前向选择(SFS)、顺序前向浮动选择(SFFS)和顺序后向浮动选择(SBFS)。从 RF 和 CART 获得的特征重要性被用作嵌入式方法。RF 与 SFFS 的性能最佳(mmce=0.12 和 AUC=0.92),且具有良好的可解释性,其中选择了三个与地下水污染区域有关的特征:i)根据其生产能力和 3km 缓冲区内向水排放的总氮量对工业和设施进行评级,ii)根据 5km 缓冲区的粪便产量对牲畜养殖场进行评级,iii)累积 NDVI 用于后最大值月份,用作植被生产力和作物产量的代理。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验