Parsaeian Mahboubeh, Jafari Khaledi Majid, Farzadfar Farshad, Mahdavi Mahdi, Zeraati Hojjat, Mahmoudi Mahmood, Khosravi Ardeshir, Mohammad Kazem
Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran.
Department of Statistics, Faculty of Mathematical Sciences, Tarbiat Modares University, Tehran, Iran.
Stat Med. 2021 Feb 20;40(4):1021-1033. doi: 10.1002/sim.8817. Epub 2020 Dec 6.
Data used to estimate the burden of diseases (BOD) are usually sparse, noisy, and heterogeneous. These data are collected from surveys, registries, and systematic reviews that have different areal units, are conducted at different times, and are reported for different age groups. In this study, we developed a Bayesian geo-statistical model to combine aggregated sparse, noisy BOD data from different sources with misaligned areal units. Our model incorporates the correlation of space, time, and age to estimate health indicators for areas with no data or a small number of observations. The model also considers the heterogeneity of data sources and the measurement errors of input data in the final estimates and uncertainty intervals. We applied the model to combine data from nine different sources of body mass index in a national and sub-national BOD study. The cross-validation results confirmed a high out-of-sample predictive ability in sparse and noisy data. The proposed model can be used by other BOD studies especially at the sub-national level when the areal units are subject to misalignment.
用于估计疾病负担(BOD)的数据通常稀疏、有噪声且具有异质性。这些数据是从调查、登记处和系统评价中收集的,这些调查、登记处和系统评价具有不同的地域单位,在不同时间进行,并针对不同年龄组进行报告。在本研究中,我们开发了一种贝叶斯地理统计模型,以将来自不同来源的汇总的稀疏、有噪声的BOD数据与未对齐的地域单位相结合。我们的模型纳入了空间、时间和年龄的相关性,以估计无数据或观测值较少地区的健康指标。该模型还在最终估计和不确定性区间中考虑了数据源的异质性和输入数据的测量误差。我们将该模型应用于一项国家和次国家BOD研究中,以合并来自九个不同来源的体重指数数据。交叉验证结果证实了该模型在稀疏和有噪声数据中具有较高的样本外预测能力。当地域单位未对齐时,尤其是在次国家层面,其他BOD研究可以使用所提出的模型。