Smith Adam, Hofner Benjamin, Lamb Juliet S, Osenkowski Jason, Allison Taber, Sadoti Giancarlo, McWilliams Scott R, Paton Peter
Department of Natural Resources Science University of Rhode Island Kingston Rhode Island.
Present address: United States Fish and Wildlife Service, National Wildlife Refuge System Inventory and Monitoring Branch Athens Georgia.
Ecol Evol. 2019 Feb 14;9(5):2346-2364. doi: 10.1002/ece3.4738. eCollection 2019 Mar.
Modeling organism distributions from survey data involves numerous statistical challenges, including accounting for zero-inflation, overdispersion, and selection and incorporation of environmental covariates. In environments with high spatial and temporal variability, addressing these challenges often requires numerous assumptions regarding organism distributions and their relationships to biophysical features. These assumptions may limit the resolution or accuracy of predictions resulting from survey-based distribution models. We propose an iterative modeling approach that incorporates a negative binomial hurdle, followed by modeling of the relationship of organism distribution and abundance to environmental covariates using generalized additive models (GAM) and generalized additive models for location, scale, and shape (GAMLSS). Our approach accounts for key features of survey data by separating binary (presence-absence) from count (abundance) data, separately modeling the mean and dispersion of count data, and incorporating selection of appropriate covariates and response functions from a suite of potential covariates while avoiding overfitting. We apply our modeling approach to surveys of sea duck abundance and distribution in Nantucket Sound (Massachusetts, USA), which has been proposed as a location for offshore wind energy development. Our model results highlight the importance of spatiotemporal variation in this system, as well as identifying key habitat features including distance to shore, sediment grain size, and seafloor topographic variation. Our work provides a powerful, flexible, and highly repeatable modeling framework with minimal assumptions that can be broadly applied to the modeling of survey data with high spatiotemporal variability. Applying GAMLSS models to the count portion of survey data allows us to incorporate potential overdispersion, which can dramatically affect model results in highly dynamic systems. Our approach is particularly relevant to systems in which little a priori knowledge is available regarding relationships between organism distributions and biophysical features, since it incorporates simultaneous selection of covariates and their functional relationships with organism responses.
根据调查数据对生物分布进行建模面临众多统计挑战,包括处理零膨胀、过度离散,以及环境协变量的选择和纳入。在空间和时间变异性较高的环境中,应对这些挑战通常需要对生物分布及其与生物物理特征的关系做出大量假设。这些假设可能会限制基于调查的分布模型所产生预测的分辨率或准确性。我们提出了一种迭代建模方法,该方法纳入负二项式障碍模型,然后使用广义相加模型(GAM)和位置、尺度和形状广义相加模型(GAMLSS)对生物分布和丰度与环境协变量之间的关系进行建模。我们的方法通过将二元(存在-不存在)数据与计数(丰度)数据分离,分别对计数数据的均值和离散度进行建模,并从一组潜在协变量中纳入适当协变量和响应函数的选择,同时避免过度拟合,从而考虑了调查数据的关键特征。我们将我们的建模方法应用于美国马萨诸塞州楠塔基特湾海鸭丰度和分布的调查,该地区已被提议作为海上风能开发地点。我们的模型结果突出了该系统中时空变化的重要性,同时识别出关键栖息地特征,包括离岸距离、沉积物粒度和海底地形变化。我们的工作提供了一个强大、灵活且高度可重复的建模框架,其假设最少,可广泛应用于对具有高时空变异性的调查数据进行建模。将GAMLSS模型应用于调查数据的计数部分,使我们能够纳入潜在的过度离散,这在高度动态的系统中可能会显著影响模型结果。我们的方法对于那些关于生物分布与生物物理特征之间关系的先验知识很少的系统尤为相关,因为它同时纳入了协变量的选择及其与生物响应的功能关系。