Dumelle Michael, Higham Matt, Hoef Jay M Ver, Olsen Anthony R, Madsen Lisa
United States Environmental Protection Agency, 200 SW 35th St, Corvallis, Oregon, 97333.
St. Lawrence University Department of Mathematics, Computer Science, and Statistics, 23 Romoda Drive, Canton, New York, 13617.
Methods Ecol Evol. 2022 Sep 1;13(9):2018-2029. doi: 10.1111/2041-210X.13919.
The design-based and model-based approaches to frequentist statistical inference rest on fundamentally different foundations. In the design-based approach, inference relies on random sampling. In the model-based approach, inference relies on distributional assumptions. We compare the approaches in a finite population spatial context.We provide relevant background for the design-based and model-based approaches and then study their performance using simulated data and real data. The real data is from the United States Environmental Protection Agency's 2012 National Lakes Assessment. A variety of sample sizes, location layouts, dependence structures, and response types are considered. The population mean is the parameter of interest, and performance is measured using statistics like bias, squared error, and interval coverage.When studying the simulated and real data, we found that regardless of the strength of spatial dependence in the data, the generalized random tessellation stratified (GRTS) algorithm, which explicitly incorporates spatial locations into sampling, tends to outperform the simple random sampling (SRS) algorithm, which does not explicitly incorporate spatial locations into sampling. We also found that model-based inference tends to outperform design-based inference, even for skewed data where the model-based distributional assumptions are violated. The performance gap between design-based inference and model-based inference is small when GRTS samples are used but large when SRS samples are used, suggesting that the sampling choice (whether to use GRTS or SRS) is most important when performing design-based inference.There are many benefits and drawbacks to the design-based and model-based approaches for finite population spatial sampling and inference that practitioners must consider when choosing between them. We provide relevant background contextualizing each approach and study their properties in a variety of scenarios, making recommendations for use based on the practitioner's goals.
基于设计和基于模型的频率统计推断方法有着根本不同的基础。在基于设计的方法中,推断依赖于随机抽样。在基于模型的方法中,推断依赖于分布假设。我们在有限总体空间背景下比较这两种方法。我们为基于设计和基于模型的方法提供相关背景,然后使用模拟数据和真实数据研究它们的性能。真实数据来自美国环境保护局2012年的全国湖泊评估。考虑了各种样本大小、位置布局、依赖结构和响应类型。总体均值是感兴趣的参数,使用偏差、平方误差和区间覆盖率等统计量来衡量性能。在研究模拟数据和真实数据时,我们发现,无论数据中空间依赖的强度如何,明确将空间位置纳入抽样的广义随机镶嵌分层(GRTS)算法往往优于未明确将空间位置纳入抽样的简单随机抽样(SRS)算法。我们还发现,基于模型的推断往往优于基于设计的推断,即使对于违反基于模型分布假设的偏态数据也是如此。当使用GRTS样本时,基于设计的推断和基于模型的推断之间的性能差距较小,但使用SRS样本时差距较大,这表明在进行基于设计的推断时,抽样选择(使用GRTS还是SRS)最为重要。对于有限总体空间抽样和推断的基于设计和基于模型的方法,有许多优点和缺点,从业者在两者之间进行选择时必须考虑。我们提供相关背景来解释每种方法,并在各种场景中研究它们的属性,根据从业者的目标提出使用建议。