Huang Jian, Frimpong Emmanuel A
Department of Fish and Wildlife Conservation, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America.
PLoS One. 2015 Jun 15;10(6):e0129995. doi: 10.1371/journal.pone.0129995. eCollection 2015.
Understanding the spatial pattern of species distributions is fundamental in biogeography, and conservation and resource management applications. Most species distribution models (SDMs) require or prefer species presence and absence data for adequate estimation of model parameters. However, observations with unreliable or unreported species absences dominate and limit the implementation of SDMs. Presence-only models generally yield less accurate predictions of species distribution, and make it difficult to incorporate spatial autocorrelation. The availability of large amounts of historical presence records for freshwater fishes of the United States provides an opportunity for deriving reliable absences from data reported as presence-only, when sampling was predominantly community-based. In this study, we used boosted regression trees (BRT), logistic regression, and MaxEnt models to assess the performance of a historical metacommunity database with inferred absences, for modeling fish distributions, investigating the effect of model choice and data properties thereby. With models of the distribution of 76 native, non-game fish species of varied traits and rarity attributes in four river basins across the United States, we show that model accuracy depends on data quality (e.g., sample size, location precision), species' rarity, statistical modeling technique, and consideration of spatial autocorrelation. The cross-validation area under the receiver-operating-characteristic curve (AUC) tended to be high in the spatial presence-absence models at the highest level of resolution for species with large geographic ranges and small local populations. Prevalence affected training but not validation AUC. The key habitat predictors identified and the fish-habitat relationships evaluated through partial dependence plots corroborated most previous studies. The community-based SDM framework broadens our capability to model species distributions by innovatively removing the constraint of lack of species absence data, thus providing a robust prediction of distribution for stream fishes in other regions where historical data exist, and for other taxa (e.g., benthic macroinvertebrates, birds) usually observed by community-based sampling designs.
了解物种分布的空间格局在生物地理学以及保护和资源管理应用中至关重要。大多数物种分布模型(SDMs)需要或更倾向于物种存在和缺失数据,以便充分估计模型参数。然而,具有不可靠或未报告物种缺失的观测数据占主导地位,并限制了SDMs的应用。仅存在模型通常对物种分布的预测准确性较低,并且难以纳入空间自相关性。美国淡水鱼类大量历史存在记录的可得性,为从仅报告为存在的数据中推导可靠的缺失数据提供了机会,前提是采样主要基于群落。在本研究中,我们使用增强回归树(BRT)、逻辑回归和最大熵模型,评估一个具有推断缺失数据的历史元群落数据库在模拟鱼类分布方面的性能,从而研究模型选择和数据属性的影响。通过对美国四个流域76种具有不同特征和稀有属性的本地非猎用鱼类物种的分布进行建模,我们表明模型准确性取决于数据质量(例如样本大小、位置精度)、物种稀有性、统计建模技术以及空间自相关性的考虑。对于地理范围大且本地种群小的物种,在最高分辨率水平下,空间存在 - 缺失模型中的接收器操作特征曲线下面积(AUC)交叉验证值往往较高。患病率影响训练AUC但不影响验证AUC。通过部分依赖图确定的关键栖息地预测因子以及评估的鱼类 - 栖息地关系,证实了大多数先前的研究。基于群落的SDM框架通过创新地消除缺乏物种缺失数据的限制,拓宽了我们对物种分布进行建模的能力,从而为其他存在历史数据的区域的溪流鱼类以及通常通过基于群落的采样设计观测的其他分类群(例如底栖大型无脊椎动物、鸟类)提供分布的可靠预测。