Baker Jannah, White Nicole, Mengersen Kerrie
Queensland University of Technology School of Mathematical Sciences, Brisbane, Australia.
Int J Health Geogr. 2014 Nov 20;13:47. doi: 10.1186/1476-072X-13-47.
Spatial analysis is increasingly important for identifying modifiable geographic risk factors for disease. However, spatial health data from surveys are often incomplete, ranging from missing data for only a few variables, to missing data for many variables. For spatial analyses of health outcomes, selection of an appropriate imputation method is critical in order to produce the most accurate inferences.
We present a cross-validation approach to select between three imputation methods for health survey data with correlated lifestyle covariates, using as a case study, type II diabetes mellitus (DM II) risk across 71 Queensland Local Government Areas (LGAs). We compare the accuracy of mean imputation to imputation using multivariate normal and conditional autoregressive prior distributions.
Choice of imputation method depends upon the application and is not necessarily the most complex method. Mean imputation was selected as the most accurate method in this application.
Selecting an appropriate imputation method for health survey data, after accounting for spatial correlation and correlation between covariates, allows more complete analysis of geographic risk factors for disease with more confidence in the results to inform public policy decision-making.
空间分析对于识别可改变的疾病地理风险因素愈发重要。然而,来自调查的空间健康数据往往不完整,从仅几个变量的数据缺失到许多变量的数据缺失不等。对于健康结果的空间分析,选择合适的插补方法对于得出最准确的推断至关重要。
我们提出一种交叉验证方法,用于在三种针对具有相关生活方式协变量的健康调查数据的插补方法之间进行选择,以昆士兰州71个地方政府区域(LGA)的II型糖尿病(DM II)风险作为案例研究。我们将均值插补的准确性与使用多元正态和条件自回归先验分布进行插补的准确性进行比较。
插补方法的选择取决于应用,不一定是最复杂的方法。在本应用中,均值插补被选为最准确的方法。
在考虑空间相关性和协变量之间的相关性之后,为健康调查数据选择合适的插补方法,能够更全面地分析疾病的地理风险因素,并对结果更有信心,从而为公共政策决策提供依据。