Suppr超能文献

用于处理大数据集的空间逻辑回归的计算技术。

Computational Techniques for Spatial Logistic Regression with Large Datasets.

作者信息

Paciorek Christopher J

机构信息

Harvard School of Public Health, 655 Huntington Avenue, Boston, MA 02115.

出版信息

Comput Stat Data Anal. 2007 May 1;51(8):3631-3653. doi: 10.1016/j.csda.2006.11.008.

Abstract

In epidemiological research, outcomes are frequently non-normal, sample sizes may be large, and effect sizes are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. I focus on binary outcomes, with the risk surface a smooth function of space, but the development herein is relevant for non-normal data in general. I compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation.A Bayesian model using a spectral basis representation of the spatial surface via the Fourier basis provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being reasonably computationally efficient. One of the contributions of this work is further development of this underused representation. The spectral basis model outperforms the penalized likelihood methods, which are prone to overfitting, but is slower to fit and not as easily implemented. A Bayesian Markov random field model performs less well statistically than the spectral basis model, but is very computationally efficient. We illustrate the methods on a real dataset of cancer cases in Taiwan.The success of the spectral basis with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models.

摘要

在流行病学研究中,结果往往呈非正态分布,样本量可能很大,效应量通常较小。为了将健康结果与地理风险因素联系起来,需要快速且强大的方法来拟合空间模型,尤其是针对非正态数据的模型。我重点关注二元结果,其中风险曲面是空间的平滑函数,但本文的发展总体上与非正态数据相关。我基于拟合度、速度和实施的简易程度,比较了惩罚似然模型(包括惩罚拟似然(PQL)方法)和贝叶斯模型。通过傅里叶基对空间曲面进行谱基表示的贝叶斯模型在模拟中提供了灵敏度和特异性的最佳权衡,既能检测到真实的空间特征,又能限制过拟合,且计算效率合理。这项工作的贡献之一是对这种未充分利用的表示方法的进一步发展。谱基模型优于容易出现过拟合的惩罚似然方法,但拟合速度较慢且实施起来不那么容易。贝叶斯马尔可夫随机场模型在统计性能上不如谱基模型,但计算效率非常高。我们用台湾癌症病例的真实数据集对这些方法进行了说明。谱基在二元数据上的成功以及在计数数据上的类似结果表明,它在空间模型和更复杂的层次模型中可能普遍有用。

相似文献

1
Computational Techniques for Spatial Logistic Regression with Large Datasets.
Comput Stat Data Anal. 2007 May 1;51(8):3631-3653. doi: 10.1016/j.csda.2006.11.008.
2
An assessment of estimation methods for generalized linear mixed models with binary outcomes.
Stat Med. 2013 Nov 20;32(26):4550-66. doi: 10.1002/sim.5866. Epub 2013 Jul 9.
3
Variable selection for binary spatial regression: Penalized quasi-likelihood approach.
Biometrics. 2016 Dec;72(4):1164-1172. doi: 10.1111/biom.12525. Epub 2016 Apr 8.
5
A comparison of computational algorithms for the Bayesian analysis of clinical trials.
Clin Trials. 2024 Dec;21(6):689-700. doi: 10.1177/17407745241247334. Epub 2024 May 16.
6
On fitting spatio-temporal disease mapping models using approximate Bayesian inference.
Stat Methods Med Res. 2014 Dec;23(6):507-30. doi: 10.1177/0962280214527528. Epub 2014 Apr 7.
8
Sample size issues in multilevel logistic regression models.
PLoS One. 2019 Nov 22;14(11):e0225427. doi: 10.1371/journal.pone.0225427. eCollection 2019.
9
Fitting multilevel models with ordinal outcomes: performance of alternative specifications and methods of estimation.
Psychol Methods. 2011 Dec;16(4):373-90. doi: 10.1037/a0025813. Epub 2011 Oct 31.

引用本文的文献

1
Spatial patterns and predictors of antenatal care interruption in war-torn Tigray, northern Ethiopia: Spatial modelling approach.
PLoS One. 2025 Aug 14;20(8):e0328802. doi: 10.1371/journal.pone.0328802. eCollection 2025.
2
Fast binary logistic regression.
PeerJ Comput Sci. 2025 Jan 30;11:e2579. doi: 10.7717/peerj-cs.2579. eCollection 2025.
4
Analyzing Spatial Dependency of the 2016-2017 Korean HPAI Outbreak to Determine the Effective Culling Radius.
Int J Environ Res Public Health. 2021 Sep 13;18(18):9643. doi: 10.3390/ijerph18189643.
5
Spatial-temporal generalized additive model for modeling COVID-19 mortality risk in Toronto, Canada.
Spat Stat. 2022 Jun;49:100526. doi: 10.1016/j.spasta.2021.100526. Epub 2021 Jul 6.
6
An Introductory Framework for Choosing Spatiotemporal Analytical Tools in Population-Level Eco-Epidemiological Research.
Front Vet Sci. 2020 Jul 7;7:339. doi: 10.3389/fvets.2020.00339. eCollection 2020.
9
Bayesian Modeling for Large Spatial Datasets.
Wiley Interdiscip Rev Comput Stat. 2012 Jan;4(1):59-66. doi: 10.1002/wics.187.
10
Hierarchical factor models for large spatially misaligned data: a low-rank predictive process approach.
Biometrics. 2013 Mar;69(1):19-30. doi: 10.1111/j.1541-0420.2012.01832.x. Epub 2013 Feb 4.

本文引用的文献

1
Regression modeling in back-propagation and projection pursuit learning.
IEEE Trans Neural Netw. 1994;5(3):342-53. doi: 10.1109/72.286906.
3
Modeling spatial survival data using semiparametric frailty models.
Biometrics. 2002 Jun;58(2):287-97. doi: 10.1111/j.0006-341x.2002.00287.x.
4
Bayesian prediction of spatial count data using generalized linear mixed models.
Biometrics. 2002 Jun;58(2):280-6. doi: 10.1111/j.0006-341x.2002.00280.x.
8
Divergent biases in ecologic and individual-level studies.
Stat Med. 1992 Jun 30;11(9):1209-23. doi: 10.1002/sim.4780110907.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验