一种贝叶斯分层小区域人口模型，该模型考虑了来自美国社区调查、人口估计计划和十年一次人口普查数据的特定数据源方法。

A BAYESIAN HIERARCHICAL SMALL AREA POPULATION MODEL ACCOUNTING FOR DATA SOURCE SPECIFIC METHODOLOGIES FROM AMERICAN COMMUNITY SURVEY, POPULATION ESTIMATES PROGRAM, AND DECENNIAL CENSUS DATA.

作者信息

Peterson Emily N, Nethery Rachel C, Padellini Tullia, Chen Jarvis T, Coull Brent A, Piel Frédéric B, Wakefield Jon, Blangiardo Marta, Waller Lance A

机构信息

Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University.

Department of Biostatistics, Harvard T.H. Chan School of Public Health.

出版信息

Ann Appl Stat. 2024 Jun;18(2):1565-1595. doi: 10.1214/23-aoas1849. Epub 2024 Apr 5.

DOI:10.1214/23-aoas1849

PMID:39323985

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11423836/

Abstract

Small area population counts are necessary for many epidemiological studies, yet their quality and accuracy are often not assessed. In the United States, small area population counts are published by the United States Census Bureau (USCB) in the form of the decennial census counts, intercensal population projections (PEP), and American Community Survey (ACS) estimates. Although there are significant relationships between these three data sources, there are important contrasts in data collection, data availability, and processing methodologies such that each set of reported population counts may be subject to different sources and magnitudes of error. Additionally, these data sources do not report identical small area population counts due to post-survey adjustments specific to each data source. Consequently, in public health studies, small area disease/mortality rates may differ depending on which data source is used for denominator data. To accurately estimate annual small area population counts associated uncertainties, we present a Bayesian population (BPop) model, which fuses information from all three USCB sources, accounting for data source specific methodologies and associated errors. We produce comprehensive small area race-stratified estimates of the true population, and associated uncertainties, given the observed trends in all three USCB population estimates. The main features of our framework are: (1) a single model integrating multiple data sources, (2) accounting for data source specific data generating mechanisms and specifically accounting for data source specific errors, and (3) prediction of population counts for years without USCB reported data. We focus our study on the Black and White only populations for 159 counties of Georgia and produce estimates for years 2006-2023. We compare BPop population estimates to decennial census counts, PEP annual counts, and ACS multi-year estimates. Additionally, we illustrate and explain the different types of data source specific errors. Lastly, we compare model performance using simulations and validation exercises. Our Bayesian population model can be extended to other applications at smaller spatial granularity and for demographic subpopulations defined further by race, age, and sex, and/or for other geographical regions.

摘要

对于许多流行病学研究而言，小区域人口计数是必要的，但它们的质量和准确性往往未得到评估。在美国，美国人口普查局（USCB）以十年一次的人口普查计数、两次普查间的人口预测（PEP）以及美国社区调查（ACS）估计数的形式发布小区域人口计数。尽管这三个数据源之间存在显著关系，但在数据收集、数据可用性和处理方法方面存在重要差异，以至于每组报告的人口计数可能受到不同来源和程度的误差影响。此外，由于每个数据源特定的调查后调整，这些数据源报告的小区域人口计数并不相同。因此，在公共卫生研究中，小区域疾病/死亡率可能因用于分母数据的数据源不同而有所差异。为了准确估计年度小区域人口计数及其相关不确定性，我们提出了一种贝叶斯人口（BPop）模型，该模型融合了来自USCB所有三个数据源的信息，考虑了数据源特定的方法和相关误差。鉴于USCB所有三个人口估计数的观测趋势，我们生成了按种族分层的真实人口的全面小区域估计数以及相关不确定性。我们框架的主要特点是：（1）一个整合多个数据源的单一模型；（2）考虑数据源特定的数据生成机制，并特别考虑数据源特定的误差；（3）对没有USCB报告数据年份的人口计数进行预测。我们将研究重点放在佐治亚州159个县仅按黑人和白人划分的人口上，并生成2006 - 2023年的估计数。我们将BPop人口估计数与十年一次的人口普查计数、PEP年度计数以及ACS多年估计数进行比较。此外，我们阐述并解释了不同类型的数据源特定误差。最后，我们通过模拟和验证练习比较模型性能。我们的贝叶斯人口模型可以扩展到其他空间粒度更小的应用，以及针对按种族、年龄和性别进一步定义的人口亚群体，和/或其他地理区域。

相似文献

A BAYESIAN HIERARCHICAL SMALL AREA POPULATION MODEL ACCOUNTING FOR DATA SOURCE SPECIFIC METHODOLOGIES FROM AMERICAN COMMUNITY SURVEY, POPULATION ESTIMATES PROGRAM, AND DECENNIAL CENSUS DATA.一种贝叶斯分层小区域人口模型，该模型考虑了来自美国社区调查、人口估计计划和十年一次人口普查数据的特定数据源方法。

Ann Appl Stat. 2024 Jun;18(2):1565-1595. doi: 10.1214/23-aoas1849. Epub 2024 Apr 5.

Comparing denominator sources for real-time disease incidence modeling: American Community Survey and WorldPop.比较用于实时疾病发病率建模的分母数据源：美国社区调查和世界人口项目。

SSM Popul Health. 2021 Apr 8;14:100786. doi: 10.1016/j.ssmph.2021.100786. eCollection 2021 Jun.

Impact of intercensal population projections and error of closure on breast cancer surveillance: examples from 10 California counties.两次人口普查期间人口预测及封闭误差对乳腺癌监测的影响：来自加利福尼亚州10个县的实例

Breast Cancer Res. 2005;7(5):R655-60. doi: 10.1186/bcr1266. Epub 2005 Jun 7.

Intercensal and Postcensal Estimation of Population Size for Small Geographic Areas in the United States.美国小地理区域人口规模的两次人口普查期间及普查后估计

Int J Popul Data Sci. 2020 Aug 13;5(1):1160. doi: 10.23889/ijpds.v5i1.1160.

Impacts of Regulations on Air Quality and Emergency Department Visits in the Atlanta Metropolitan Area, 1999-2013.1999 - 2013年法规对亚特兰大大都市区空气质量及急诊就诊情况的影响

Res Rep Health Eff Inst. 2018 Apr;2018(195):1-93.

Impact of Differential Privacy and Census Tract Data Source (Decennial Census Versus American Community Survey) for Monitoring Health Inequities.监测健康不平等状况的差分隐私和人口普查区数据源（十年一次的人口普查与美国社区调查）的影响。

Am J Public Health. 2021 Feb;111(2):265-268. doi: 10.2105/AJPH.2020.305989. Epub 2020 Dec 22.

The effect of revised populations on mortality statistics for the United States, 2000.2000年美国人口修订对死亡率统计数据的影响。

Natl Vital Stat Rep. 2003 Jun 5;51(9):1-24.

Evaluating Linearly Interpolated Intercensal Estimates of Demographic and Socioeconomic Characteristics of U.S. Counties and Census Tracts 2001-2009.评估2001 - 2009年美国各县及人口普查区人口和社会经济特征的线性插值中间人口估计值

Popul Res Policy Rev. 2015 Aug;34(4):541-59. doi: 10.1007/s11113-015-9359-8. Epub 2015 Jul 2.

Patterns and causes of uncertainty in the American Community Survey.美国社区调查中不确定性的模式与成因

Appl Geogr. 2014 Jan;46:147-157. doi: 10.1016/j.apgeog.2013.11.002.

Mortality-Air Pollution Associations in Low Exposure Environments (MAPLE): Phase 2.低暴露环境下死亡率与空气污染关联研究（MAPLE）：第二阶段。

Res Rep Health Eff Inst. 2022 Jul;2022(212):1-91.

引用本文的文献

Estimating Prevalence of Opioid Misuse in North Carolina Counties From 2016 to 2021: An Integrated Abundance Model Approach.估算2016年至2021年北卡罗来纳州县阿片类药物滥用的流行率：一种综合丰度模型方法。

Epidemiology. 2025 May 1;36(3):310-318. doi: 10.1097/EDE.0000000000001838. Epub 2025 Jan 24.

本文引用的文献

SSM Popul Health. 2021 Apr 8;14:100786. doi: 10.1016/j.ssmph.2021.100786. eCollection 2021 Jun.

An intuitive Bayesian spatial model for disease mapping that accounts for scaling.一种考虑尺度效应的用于疾病制图的直观贝叶斯空间模型。

Stat Methods Med Res. 2016 Aug;25(4):1145-65. doi: 10.1177/0962280216660421.

Reducing uncertainty in the american community survey through data-driven regionalization.通过数据驱动的区域化减少美国社区调查中的不确定性。

PLoS One. 2015 Feb 27;10(2):e0115626. doi: 10.1371/journal.pone.0115626. eCollection 2015.

Patterns and causes of uncertainty in the American Community Survey.美国社区调查中不确定性的模式与成因

Appl Geogr. 2014 Jan;46:147-157. doi: 10.1016/j.apgeog.2013.11.002.

Disease mapping and spatial regression with count data.利用计数数据进行疾病映射与空间回归。

Biostatistics. 2007 Apr;8(2):158-83. doi: 10.1093/biostatistics/kxl008. Epub 2006 Jun 29.

Bayesian modelling of inseparable space-time variation in disease risk.疾病风险中不可分割的时空变异的贝叶斯建模。

Stat Med. 2000;19(17-18):2555-67. doi: 10.1002/1097-0258(20000915/30)19:17/18<2555::aid-sim587>3.0.co;2-#.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。