Aregay Mehreteab, Lawson Andrew B, Faes Christel, Kirby Russell S, Carroll Rachel, Watjou Kevin
Department of Public Health, Medical University of South Carolina, Charleston SC USA.
Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium.
Environmetrics. 2018 Feb;29(1). doi: 10.1002/env.2477. Epub 2017 Oct 1.
It is our primary focus to study the spatial distribution of disease incidence at different geographical levels. Often, spatial data are available in the form of aggregation at multiple scale levels such as census tract, county, state, and so on. When data are aggregated from a fine (e.g. county) to a coarse (e.g. state) geographical level, there will be loss of information. The problem is more challenging when excessive zeros are available at the fine level. After data aggregation, the excessive zeros at the fine level will be reduced at the coarse level. If we ignore the zero inflation and the aggregation effect, we could get inconsistent risk estimates at the fine and coarse levels. Hence, in this paper, we address those problems using zero inflated multiscale models that jointly describe the risk variations at different geographical levels. For the excessive zeros at the fine level, we use a zero inflated convolution model, whereas we consider a regular convolution model for the smoothed data at the coarse level. These methods provide a consistent risk estimate at the fine and coarse levels when high percentages of structural zeros are present in the data.
我们的主要研究重点是在不同地理层面上研究疾病发病率的空间分布。通常,空间数据以多种尺度层面的汇总形式存在,如普查区、县、州等。当数据从精细(如县)地理层面汇总到粗略(如州)地理层面时,会出现信息丢失。当精细层面存在过多零值时,问题会更具挑战性。数据汇总后,精细层面的过多零值在粗略层面会减少。如果我们忽略零膨胀和汇总效应,在精细和粗略层面可能会得到不一致的风险估计。因此,在本文中,我们使用零膨胀多尺度模型来解决这些问题,该模型联合描述不同地理层面的风险变化。对于精细层面的过多零值,我们使用零膨胀卷积模型,而对于粗略层面的平滑数据,我们考虑常规卷积模型。当数据中存在高比例的结构性零值时,这些方法在精细和粗略层面提供一致的风险估计。