Paciorek Christopher J
Harvard School of Public Health, 655 Huntington Avenue, Boston, MA 02115.
Comput Stat Data Anal. 2007 May 1;51(8):3631-3653. doi: 10.1016/j.csda.2006.11.008.
In epidemiological research, outcomes are frequently non-normal, sample sizes may be large, and effect sizes are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. I focus on binary outcomes, with the risk surface a smooth function of space, but the development herein is relevant for non-normal data in general. I compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation.A Bayesian model using a spectral basis representation of the spatial surface via the Fourier basis provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being reasonably computationally efficient. One of the contributions of this work is further development of this underused representation. The spectral basis model outperforms the penalized likelihood methods, which are prone to overfitting, but is slower to fit and not as easily implemented. A Bayesian Markov random field model performs less well statistically than the spectral basis model, but is very computationally efficient. We illustrate the methods on a real dataset of cancer cases in Taiwan.The success of the spectral basis with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models.
在流行病学研究中,结果往往呈非正态分布,样本量可能很大,效应量通常较小。为了将健康结果与地理风险因素联系起来,需要快速且强大的方法来拟合空间模型,尤其是针对非正态数据的模型。我重点关注二元结果,其中风险曲面是空间的平滑函数,但本文的发展总体上与非正态数据相关。我基于拟合度、速度和实施的简易程度,比较了惩罚似然模型(包括惩罚拟似然(PQL)方法)和贝叶斯模型。通过傅里叶基对空间曲面进行谱基表示的贝叶斯模型在模拟中提供了灵敏度和特异性的最佳权衡,既能检测到真实的空间特征,又能限制过拟合,且计算效率合理。这项工作的贡献之一是对这种未充分利用的表示方法的进一步发展。谱基模型优于容易出现过拟合的惩罚似然方法,但拟合速度较慢且实施起来不那么容易。贝叶斯马尔可夫随机场模型在统计性能上不如谱基模型,但计算效率非常高。我们用台湾癌症病例的真实数据集对这些方法进行了说明。谱基在二元数据上的成功以及在计数数据上的类似结果表明,它在空间模型和更复杂的层次模型中可能普遍有用。