Centre for Nutrition, Prevention and Health Services, RIVM (National Institute for Public Health and the Environment), P.O. Box 1, Mailbox 86, 3720 BA, Bilthoven, The Netherlands.
Capaciteit Orgaan (Advisory Committee on Medical Manpower Planning), Mercatorlaan 1200, 3525 BL, Utrecht, The Netherlands.
Popul Health Metr. 2023 Oct 31;21(1):19. doi: 10.1186/s12963-023-00319-5.
To develop public health intervention models using micro-simulations, extensive personal information about inhabitants is needed, such as socio-demographic, economic and health figures. Confidentiality is an essential characteristic of such data, while the data should reflect realistic scenarios. Collection of such data is possible only in secured environments and not directly available for open-source micro-simulation models. The aim of this paper is to illustrate a method of construction of synthetic data by predicting individual features through models based on confidential data on health and socio-economic determinants of the entire Dutch population.
Administrative records and health registry data were linked to socio-economic characteristics and self-reported lifestyle factors. For the entire Dutch population (n = 16,778,708), all socio-demographic information except lifestyle factors was available. Lifestyle factors were available from the 2012 Dutch Health Monitor (n = 370,835). Regression model was used to sequentially predict individual features.
The synthetic population resembles the original confidential population. Features predicted in the first stages of the sequential procedure are virtually similar to those in the original population, while those predicted in later stages of the sequential procedure carry the accumulation of limitations furthered by data quality and previously modelled features.
By combining socio-demographic, economic, health and lifestyle related data at individual level on a large scale, our method provides us with a powerful tool to construct a synthetic population of good quality and with no confidentiality issues.
为了使用微观模拟开发公共卫生干预模型,需要居民的大量个人信息,如社会人口统计学、经济和健康数据。此类数据的保密性是其基本特征,而数据应反映现实场景。只有在安全环境中才能收集此类数据,并且不能直接用于开源微观模拟模型。本文的目的是通过基于整个荷兰人口健康和社会经济决定因素的保密数据的模型来预测个体特征,从而说明构建合成数据的方法。
将行政记录和健康登记数据与社会经济特征和自我报告的生活方式因素相关联。对于整个荷兰人口(n=16778708),除生活方式因素外,所有社会人口统计学信息都可用。生活方式因素可从 2012 年荷兰健康监测(n=370835)中获得。回归模型用于依次预测个体特征。
合成人口与原始保密人口相似。在顺序过程的第一阶段预测的特征与原始人口中的特征几乎相同,而在顺序过程的后期阶段预测的特征则累积了数据质量和先前建模特征带来的限制。
通过在个体层面上大规模结合社会人口统计学、经济、健康和生活方式相关数据,我们的方法为我们提供了一种构建高质量且不存在保密性问题的合成人口的强大工具。