大数据时代的合成人口构建。

Constructing synthetic populations in the age of big data.

机构信息

Centre for Nutrition, Prevention and Health Services, RIVM (National Institute for Public Health and the Environment), P.O. Box 1, Mailbox 86, 3720 BA, Bilthoven, The Netherlands.

Capaciteit Orgaan (Advisory Committee on Medical Manpower Planning), Mercatorlaan 1200, 3525 BL, Utrecht, The Netherlands.

出版信息

Popul Health Metr. 2023 Oct 31;21(1):19. doi: 10.1186/s12963-023-00319-5.

DOI:10.1186/s12963-023-00319-5

PMID:37907904

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10617102/

Abstract

BACKGROUND

To develop public health intervention models using micro-simulations, extensive personal information about inhabitants is needed, such as socio-demographic, economic and health figures. Confidentiality is an essential characteristic of such data, while the data should reflect realistic scenarios. Collection of such data is possible only in secured environments and not directly available for open-source micro-simulation models. The aim of this paper is to illustrate a method of construction of synthetic data by predicting individual features through models based on confidential data on health and socio-economic determinants of the entire Dutch population.

METHODS

Administrative records and health registry data were linked to socio-economic characteristics and self-reported lifestyle factors. For the entire Dutch population (n = 16,778,708), all socio-demographic information except lifestyle factors was available. Lifestyle factors were available from the 2012 Dutch Health Monitor (n = 370,835). Regression model was used to sequentially predict individual features.

RESULTS

The synthetic population resembles the original confidential population. Features predicted in the first stages of the sequential procedure are virtually similar to those in the original population, while those predicted in later stages of the sequential procedure carry the accumulation of limitations furthered by data quality and previously modelled features.

CONCLUSIONS

By combining socio-demographic, economic, health and lifestyle related data at individual level on a large scale, our method provides us with a powerful tool to construct a synthetic population of good quality and with no confidentiality issues.

摘要

背景

为了使用微观模拟开发公共卫生干预模型，需要居民的大量个人信息，如社会人口统计学、经济和健康数据。此类数据的保密性是其基本特征，而数据应反映现实场景。只有在安全环境中才能收集此类数据，并且不能直接用于开源微观模拟模型。本文的目的是通过基于整个荷兰人口健康和社会经济决定因素的保密数据的模型来预测个体特征，从而说明构建合成数据的方法。

方法

将行政记录和健康登记数据与社会经济特征和自我报告的生活方式因素相关联。对于整个荷兰人口（n=16778708），除生活方式因素外，所有社会人口统计学信息都可用。生活方式因素可从 2012 年荷兰健康监测（n=370835）中获得。回归模型用于依次预测个体特征。

结果

合成人口与原始保密人口相似。在顺序过程的第一阶段预测的特征与原始人口中的特征几乎相同，而在顺序过程的后期阶段预测的特征则累积了数据质量和先前建模特征带来的限制。

结论

通过在个体层面上大规模结合社会人口统计学、经济、健康和生活方式相关数据，我们的方法为我们提供了一种构建高质量且不存在保密性问题的合成人口的强大工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aafe/10617102/bdd902dde5b9/12963_2023_319_Fig1_HTML.jpg

相似文献

Constructing synthetic populations in the age of big data.大数据时代的合成人口构建。

Popul Health Metr. 2023 Oct 31;21(1):19. doi: 10.1186/s12963-023-00319-5.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Determinants of and socio-economic disparities in self-rated health in China.中国自评健康的决定因素和社会经济差异。

Int J Equity Health. 2017 Jan 11;16(1):7. doi: 10.1186/s12939-016-0496-4.

Associations between lifestyle and air pollution exposure: Potential for confounding in large administrative data cohorts.生活方式与空气污染暴露之间的关联：大型行政数据队列中的混杂可能性。

Environ Res. 2017 Jul;156:364-373. doi: 10.1016/j.envres.2017.03.050. Epub 2017 Apr 10.

Public sector reforms and their impact on the level of corruption: A systematic review.公共部门改革及其对腐败程度的影响：一项系统综述。

Campbell Syst Rev. 2021 May 24;17(2):e1173. doi: 10.1002/cl2.1173. eCollection 2021 Jun.

Mapping chronic disease prevalence based on medication use and socio-demographic variables: an application of LASSO on administrative data sources in healthcare in the Netherlands.基于药物使用和社会人口统计学变量的慢性病患病率映射：在荷兰医疗保健中使用 LASSO 对行政数据来源的应用。

BMC Public Health. 2021 Jun 2;21(1):1039. doi: 10.1186/s12889-021-10754-4.

Demographic, Socio-economic and Lifestyle Determinants of Under- and Over-nutrition among Bangladeshi Adult Population: Results from a Large Cross-Sectional Study.孟加拉国成年人口营养不足和营养过剩的人口统计学、社会经济和生活方式决定因素：一项大型横断面研究的结果。

J Epidemiol Glob Health. 2018 Dec;8(3-4):134-142. doi: 10.2991/j.jegh.2018.03.002.

Predicting the Population Risk of Suicide Using Routinely Collected Health Administrative Data in Quebec, Canada: Model-Based Synthetic Estimation Study.利用加拿大魁北克省常规收集的健康行政数据预测自杀的人群风险：基于模型的综合估计研究。

JMIR Public Health Surveill. 2024 Jun 28;10:e52773. doi: 10.2196/52773.

Socio-demographic factors and lifestyle associated with symptomatic hemorrhoids: Big data analysis using the National Health insurance Service-National Health screening cohort (NHIS-HEALS) database in Korea.与症状性痔疮相关的社会人口统计学因素和生活方式：使用韩国国民健康保险服务-国民健康筛查队列（NHIS-HEALS）数据库的大数据分析

Asian J Surg. 2022 Jan;45(1):353-359. doi: 10.1016/j.asjsur.2021.06.020. Epub 2021 Jun 26.

The Minderoo-Monaco Commission on Plastics and Human Health.美诺集团-摩纳哥基金会塑料与人体健康委员会

Ann Glob Health. 2023 Mar 21;89(1):23. doi: 10.5334/aogh.4056. eCollection 2023.

引用本文的文献

Using data linkage for mental health research in Australia.利用数据链接开展澳大利亚的心理健康研究。

Aust N Z J Psychiatry. 2025 Jul;59(7):588-601. doi: 10.1177/00048674251333574. Epub 2025 May 13.

本文引用的文献

BMC Public Health. 2021 Jun 2;21(1):1039. doi: 10.1186/s12889-021-10754-4.

Assessing the future medical cost burden for the European health systems under alternative exposure-to-risks scenarios.评估在不同风险暴露情景下，欧洲卫生系统未来的医疗费用负担。

PLoS One. 2020 Sep 11;15(9):e0238565. doi: 10.1371/journal.pone.0238565. eCollection 2020.

How will the main risk factors contribute to the burden of non-communicable diseases under different scenarios by 2050? A modelling study.到 2050 年，主要风险因素将如何在不同情景下对非传染性疾病负担产生影响？一项建模研究。

PLoS One. 2020 Apr 29;15(4):e0231725. doi: 10.1371/journal.pone.0231725. eCollection 2020.

Internal consistency of a synthetic population construction method for chronic disease micro-simulation models.构建慢性病微观模拟模型的综合人口构建方法的内部一致性。

PLoS One. 2018 Nov 15;13(11):e0205225. doi: 10.1371/journal.pone.0205225. eCollection 2018.

Limited salt consumption reduces the incidence of chronic kidney disease: a modeling study.低盐饮食可降低慢性肾脏病的发病率：一项建模研究。

J Public Health (Oxf). 2018 Sep 1;40(3):e351-e358. doi: 10.1093/pubmed/fdx178.

Calibrating Parameters for Microsimulation Disease Models: A Review and Comparison of Different Goodness-of-Fit Criteria.校准微观模拟疾病模型的参数：不同拟合优度标准的综述与比较

Med Decis Making. 2016 Jul;36(5):652-65. doi: 10.1177/0272989X16636851. Epub 2016 Mar 8.

The DYNAMO-HIA model: an efficient implementation of a risk factor/chronic disease Markov model for use in Health Impact Assessment (HIA).DYNAMO-HIA 模型：一种高效实现风险因素/慢性病马尔可夫模型的方法，用于健康影响评估 (HIA)。

Demography. 2012 Nov;49(4):1259-83. doi: 10.1007/s13524-012-0122-z.

The estimation of population microdata by using data from small area statistics and samples of anonymised records.利用小区域统计数据和匿名记录样本对人口微观数据进行估计。

Environ Plan A. 1998 May;30(5):785-816. doi: 10.1068/a300785.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

大数据时代的合成人口构建。

Constructing synthetic populations in the age of big data.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献