French Jonathan L, Wand Matthew P
Biostatistics, Global Research and Development, Pfizer, Inc, 50 Pequot Avenue, New London, CT 06320, USA.
Biostatistics. 2004 Apr;5(2):177-91. doi: 10.1093/biostatistics/5.2.177.
Maps depicting cancer incidence rates have become useful tools in public health research, giving valuable information about the spatial variation in rates of disease. Typically, these maps are generated using count data aggregated over areas such as counties or census blocks. However, with the proliferation of geographic information systems and related databases, it is becoming easier to obtain exact spatial locations for the cancer cases and suitable control subjects. The use of such point data allows us to adjust for individual-level covariates, such as age and smoking status, when estimating the spatial variation in disease risk. Unfortunately, such covariate information is often subject to missingness. We propose a method for mapping cancer risk when covariates are not completely observed. We model these data using a logistic generalized additive model. Estimates of the linear and non-linear effects are obtained using a mixed effects model representation. We develop an EM algorithm to account for missing data and the random effects. Since the expectation step involves an intractable integral, we estimate the E-step with a Laplace approximation. This framework provides a general method for handling missing covariate values when fitting generalized additive models. We illustrate our method through an analysis of cancer incidence data from Cape Cod, Massachusetts. These analyses demonstrate that standard complete-case methods can yield biased estimates of the spatial variation of cancer risk.
描绘癌症发病率的地图已成为公共卫生研究中的有用工具,能提供有关疾病发病率空间变化的宝贵信息。通常,这些地图是使用在县或人口普查街区等区域汇总的计数数据生成的。然而,随着地理信息系统及相关数据库的激增,获取癌症病例和合适对照对象的确切空间位置变得更加容易。使用此类点数据使我们在估计疾病风险的空间变化时能够调整个体层面的协变量,如年龄和吸烟状况。不幸的是,此类协变量信息常常存在缺失情况。我们提出一种在协变量未被完全观测到时绘制癌症风险地图的方法。我们使用逻辑广义相加模型对这些数据进行建模。线性和非线性效应的估计通过混合效应模型表示来获得。我们开发了一种期望最大化(EM)算法来处理缺失数据和随机效应。由于期望步骤涉及一个难以处理的积分,我们用拉普拉斯近似来估计期望步骤。这个框架为在拟合广义相加模型时处理缺失协变量值提供了一种通用方法。我们通过对马萨诸塞州科德角的癌症发病率数据进行分析来说明我们的方法。这些分析表明,标准的完整病例方法可能会对癌症风险的空间变化产生有偏差的估计。