Department of Biosystems and Agricultural Engineering, 524 S. Shaw Lane, Room 216, Michigan State University, East Lansing, MI 48824, USA.
Department of Biosystems and Agricultural Engineering, 524 S. Shaw Lane, Room 216, Michigan State University, East Lansing, MI 48824, USA.
Sci Total Environ. 2015 Apr 1;511:341-53. doi: 10.1016/j.scitotenv.2014.12.066. Epub 2014 Dec 29.
Variable selection is a critical step in development of empirical stream health prediction models. This study develops a framework for selecting important in-stream variables to predict four measures of biological integrity: total number of Ephemeroptera, Plecoptera, and Trichoptera (EPT) taxa, family index of biotic integrity (FIBI), Hilsenhoff biotic integrity (HBI), and fish index of biotic integrity (IBI). Over 200 flow regime and water quality variables were calculated using the Hydrologic Index Tool (HIT) and Soil and Water Assessment Tool (SWAT). Streams of the River Raisin watershed in Michigan were grouped using the Strahler stream classification system (orders 1-3 and orders 4-6), k-means clustering technique (two clusters: C1 and C2), and all streams (one grouping). For each grouping, variable selection was performed using Bayesian variable selection, principal component analysis, and Spearman's rank correlation. Following selection of best variable sets, models were developed to predict the measures of biological integrity using adaptive-neuro fuzzy inference systems (ANFIS), a technique well-suited to complex, nonlinear ecological problems. Multiple unique variable sets were identified, all which differed by selection method and stream grouping. Final best models were mostly built using the Bayesian variable selection method. The most effective stream grouping method varied by health measure, although k-means clustering and grouping by stream order were always superior to models built without grouping. Commonly selected variables were related to streamflow magnitude, rate of change, and seasonal nitrate concentration. Each best model was effective in simulating stream health observations, with EPT taxa validation R2 ranging from 0.67 to 0.92, FIBI ranging from 0.49 to 0.85, HBI from 0.56 to 0.75, and fish IBI at 0.99 for all best models. The comprehensive variable selection and modeling process proposed here is a robust method that extends our understanding of watershed scale stream health beyond sparse monitoring points.
变量选择是开发经验流健康预测模型的关键步骤。本研究开发了一个框架,用于选择重要的河流内变量来预测四种生物完整性指标:蜉蝣目、石蝇目和毛翅目(EPT)类群总数、生物完整性综合指数(FIBI)、希尔森霍夫生物完整性(HBI)和鱼类生物完整性指数(IBI)。使用水文指数工具(HIT)和土壤和水评估工具(SWAT)计算了 200 多个流量和水质变量。密西根州雷森河流域的河流按斯特拉勒河流分类系统(等级 1-3 和等级 4-6)、k-均值聚类技术(两个聚类:C1 和 C2)和所有河流(一个分组)进行分组。对于每个分组,使用贝叶斯变量选择、主成分分析和斯皮尔曼秩相关进行变量选择。在选择最佳变量集后,使用自适应神经模糊推理系统(ANFIS)开发模型来预测生物完整性指标,该技术非常适合复杂的非线性生态问题。确定了多个独特的变量集,这些变量集在选择方法和河流分组上都有所不同。最终的最佳模型大多是使用贝叶斯变量选择方法构建的。最佳的河流分组方法因健康衡量标准而异,尽管 k-均值聚类和按河流等级分组始终优于没有分组的模型。常用的选择变量与河流流量大小、变化率和季节性硝酸盐浓度有关。每个最佳模型都有效地模拟了河流健康观测,EPT 类群验证 R2 范围从 0.67 到 0.92,FIBI 范围从 0.49 到 0.85,HBI 范围从 0.56 到 0.75,所有最佳模型的鱼类 IBI 为 0.99。这里提出的综合变量选择和建模过程是一种稳健的方法,可以扩展我们对流域尺度河流健康的理解,超越稀疏监测点。