Suppr超能文献

GridSample:一个用于从网格化人口数据生成住户调查初级抽样单位(PSU)的R软件包。

GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data.

作者信息

Thomson Dana R, Stevens Forrest R, Ruktanonchai Nick W, Tatem Andrew J, Castro Marcia C

机构信息

Department of Social Statistics and Demography, University of Southampton, Building 58, Southampton, SO17 1BJ, UK.

WorldPop, Department of Geography and Environment, University of Southampton, Building 44, Southampton, SO17 1BJ, UK.

出版信息

Int J Health Geogr. 2017 Jul 19;16(1):25. doi: 10.1186/s12942-017-0098-4.

Abstract

BACKGROUND

Household survey data are collected by governments, international organizations, and companies to prioritize policies and allocate billions of dollars. Surveys are typically selected from recent census data; however, census data are often outdated or inaccurate. This paper describes how gridded population data might instead be used as a sample frame, and introduces the R GridSample algorithm for selecting primary sampling units (PSU) for complex household surveys with gridded population data. With a gridded population dataset and geographic boundary of the study area, GridSample allows a two-step process to sample "seed" cells with probability proportionate to estimated population size, then "grows" PSUs until a minimum population is achieved in each PSU. The algorithm permits stratification and oversampling of urban or rural areas. The approximately uniform size and shape of grid cells allows for spatial oversampling, not possible in typical surveys, possibly improving small area estimates with survey results.

RESULTS

We replicated the 2010 Rwanda Demographic and Health Survey (DHS) in GridSample by sampling the WorldPop 2010 UN-adjusted 100 m × 100 m gridded population dataset, stratifying by Rwanda's 30 districts, and oversampling in urban areas. The 2010 Rwanda DHS had 79 urban PSUs, 413 rural PSUs, with an average PSU population of 610 people. An equivalent sample in GridSample had 75 urban PSUs, 405 rural PSUs, and a median PSU population of 612 people. The number of PSUs differed because DHS added urban PSUs from specific districts while GridSample reallocated rural-to-urban PSUs across all districts.

CONCLUSIONS

Gridded population sampling is a promising alternative to typical census-based sampling when census data are moderately outdated or inaccurate. Four approaches to implementation have been tried: (1) using gridded PSU boundaries produced by GridSample, (2) manually segmenting gridded PSU using satellite imagery, (3) non-probability sampling (e.g. random-walk, "spin-the-pen"), and random sampling of households. Gridded population sampling is in its infancy, and further research is needed to assess the accuracy and feasibility of gridded population sampling. The GridSample R algorithm can be used to forward this research agenda.

摘要

背景

政府、国际组织和公司收集家庭调查数据,以便确定政策优先级并分配数十亿美元。调查样本通常从最新的人口普查数据中选取;然而,人口普查数据往往过时或不准确。本文描述了如何将网格化人口数据用作抽样框,并介绍了用于为具有网格化人口数据的复杂家庭调查选择初级抽样单位(PSU)的R语言GridSample算法。利用网格化人口数据集和研究区域的地理边界,GridSample允许通过两步过程对“种子”单元格进行抽样,其概率与估计的人口规模成比例,然后“扩展”PSU,直到每个PSU达到最小人口规模。该算法允许对城市或农村地区进行分层和超抽样。网格单元格大致统一的大小和形状允许进行空间超抽样,这在典型调查中是不可能的,可能会利用调查结果改进小区域估计。

结果

我们在GridSample中通过对2010年世界Pop未调整的100米×100米网格化人口数据集进行抽样、按卢旺达的30个区进行分层以及在城市地区进行超抽样,复制了2010年卢旺达人口与健康调查(DHS)。2010年卢旺达DHS有79个城市PSU、413个农村PSU,每个PSU的平均人口为610人。GridSample中的等效样本有75个城市PSU、405个农村PSU,每个PSU的人口中位数为612人。PSU的数量不同,是因为DHS从特定区增加了城市PSU,而GridSample在所有区重新分配了农村到城市的PSU。

结论

当人口普查数据适度过时或不准确时,网格化人口抽样是基于典型人口普查抽样的一种有前景的替代方法。已经尝试了四种实施方法:(1)使用GridSample生成的网格化PSU边界,(2)使用卫星图像手动分割网格化PSU,(3)非概率抽样(如随机游走、“转笔”),以及对家庭进行随机抽样。网格化人口抽样尚处于起步阶段,需要进一步研究以评估网格化人口抽样的准确性和可行性。GridSample R算法可用于推进这一研究议程。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fb69/5518145/fc300d0a17ab/12942_2017_98_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验