Burgard J P, Krause J, Münnich R
Department of Economic and Social Statistics, Trier University, Trier, Germany.
J Appl Stat. 2020 May 14;48(9):1659-1674. doi: 10.1080/02664763.2020.1765323. eCollection 2021.
Hypertension is a highly prevalent cardiovascular disease. It marks a considerable cost factor to many national health systems. Despite its prevalence, regional disease distributions are often unknown and must be estimated from survey data. However, health surveys frequently lack in regional observations due to limited resources. Obtained prevalence estimates suffer from unacceptably large sampling variances and are not reliable. Small area estimation solves this problem by linking auxiliary data from multiple regions in suitable regression models. Typically, either unit- or area-level observations are considered for this purpose. But with respect to hypertension, both levels should be used. Hypertension has characteristic comorbidities and is strongly related to lifestyle features, which are unit-level information. It is also correlated with socioeconomic indicators that are usually measured on the area-level. But the level combination is challenging as it requires multi-level model parameter estimation from small samples. We use a multi-level small area model with level-specific penalization to overcome this issue. Model parameter estimation is performed via stochastic coordinate gradient descent. A jackknife estimator of the mean squared error is presented. The methodology is applied to combine health survey data and administrative records to estimate regional hypertension prevalence in Germany.
高血压是一种高度流行的心血管疾病。它是许多国家卫生系统的一个重要成本因素。尽管其患病率很高,但区域疾病分布往往未知,必须从调查数据中进行估计。然而,由于资源有限,健康调查经常缺乏区域观测数据。所获得的患病率估计值存在不可接受的大抽样方差,不可靠。小区域估计通过在合适的回归模型中链接来自多个区域的辅助数据来解决这个问题。通常,为此目的会考虑单位或区域层面的观测数据。但对于高血压而言,两个层面的数据都应使用。高血压具有特征性的合并症,并且与生活方式特征密切相关,这些都是单位层面的信息。它还与通常在区域层面测量的社会经济指标相关。但这种层面的组合具有挑战性,因为它需要从小样本中进行多层次模型参数估计。我们使用具有特定层面惩罚的多层次小区域模型来克服这个问题。模型参数估计通过随机坐标梯度下降进行。提出了均方误差的刀切估计量。该方法被应用于结合健康调查数据和行政记录来估计德国的区域高血压患病率。