为什么抽样比很重要：逻辑回归与栖息地利用研究。

Why sampling ratio matters: Logistic regression and studies of habitat use.

机构信息

Institute of Forest Ecology, Slovak Academy of Sciences, Zvolen, Slovakia.

Institute of Biology and Ecology, Faculty of Science, P. J. Šafárik University in Košice, Košice, Slovakia.

出版信息

PLoS One. 2018 Jul 23;13(7):e0200742. doi: 10.1371/journal.pone.0200742. eCollection 2018.

DOI:10.1371/journal.pone.0200742

PMID:30036369

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6056037/

Abstract

Logistic regression (LR) models are among the most frequently used statistical tools in ecology. With LR one can infer if a species' habitat use is related to environmental factors and estimate the probability of species occurrence based on the values of these factors. However, studies often use inadequate sampling with regards to the arbitrarily chosen ratio between occupied and unoccupied (or available) locations, and this has a profound effect on the inference and predictive power of LR models. To demonstrate the effect of various sampling strategies/efforts on the quality of LR models, we used a unique census dataset containing all the used roosting cavities of the tree-dwelling bat Nyctalus leisleri and all cavities where the species was absent. We compared models constructed from randomly selected data subsets with varying ratios of occupied and unoccupied cavities (1:1, 1:5, 1:10) with a full dataset model (ratio 1:31). These comparisons revealed that the power of LR models was low when the sampling did not reflect the population ratio of occupied and unoccupied cavities. The use of weights improved the subsampled models. Thus, this study warns against inadequate data sampling and highly encourages a randomized sampling procedure to estimate the true ratio of occupied:unoccupied locations, which can then be used to optimize a manageable sampling effort and apply weights to improve the LR model. Such an approach may provide robust and reliable models suitable for both inference and prediction.

摘要

逻辑回归（LR）模型是生态学中最常用的统计工具之一。通过 LR，我们可以推断一个物种的栖息地利用是否与环境因素有关，并根据这些因素的值来估计物种出现的概率。然而，研究中经常在任意选择的占有和未占有（或可用）位置之间的比例方面存在采样不足的问题，这对 LR 模型的推断和预测能力有深远的影响。为了展示各种采样策略/努力对 LR 模型质量的影响，我们使用了一个独特的普查数据集，其中包含树栖蝙蝠 Nyctalus leisleri 的所有使用的栖息洞穴和该物种不存在的所有洞穴。我们比较了从随机选择的具有不同占有和未占有洞穴比例（1:1、1:5、1:10）的数据子集构建的模型与完整数据集模型（比例为 1:31）的模型。这些比较表明，当采样不能反映占有和未占有洞穴的种群比例时，LR 模型的效力较低。使用权重可以提高子采样模型的效力。因此，本研究警告不要进行采样不足，并强烈鼓励采用随机采样程序来估计实际的占有：未占有位置的比例，然后可以使用该比例来优化可管理的采样工作，并应用权重来改进 LR 模型。这种方法可以提供适合推断和预测的稳健可靠的模型。