Draidi Areed Wala, Price Aiden, Thompson Helen, Hassan Conor, Malseed Reid, Mengersen Kerrie
School of Mathematical Science, Centre for Data Science, Queensland University of Technology, Brisbane, Australia.
Children's Health Queensland, Herston, Australia.
R Soc Open Sci. 2024 Jun 19;11(6):231780. doi: 10.1098/rsos.231780. eCollection 2024 Jun.
Spatial statistical models are commonly used in geographical scenarios to ensure spatial variation is captured effectively. However, spatial models and cluster algorithms can be complicated and expensive. One of these algorithms is geographically weighted regression (GWR) which was proposed in the geography literature to allow relationships in a regression model to vary over space. In contrast to traditional linear regression models, which have constant regression coefficients over space, regression coefficients are estimated locally at spatially referenced data points with GWR. The motivation for the adaption of GWR is the idea that a set of constant regression coefficients cannot adequately capture spatially varying relationships between covariates and an outcome variable. GWR has been applied widely in diverse fields, such as ecology, forestry, epidemiology, neurology and astronomy. While frequentist GWR gives us point estimates and confidence intervals, Bayesian GWR enriches our understanding by including prior knowledge and providing probability distributions for parameters and predictions of interest. This paper pursues three main objectives. First, it introduces covariate effect clustering by integrating a Bayesian geographically weighted regression (BGWR) with a post-processing step that includes Gaussian mixture model and the Dirichlet process mixture model. Second, this paper examines situations in which a particular covariate holds significant importance in one region but not in another in the Bayesian framework. Lastly, it addresses computational challenges in existing BGWR, leading to enhancements in Markov chain Monte Carlo estimation suitable for large spatial datasets. The efficacy of the proposed method is demonstrated using simulated data and is further validated in a case study examining children's development domains in Queensland, Australia, using data provided by Children's Health Queensland and Australia's Early Development Census.
空间统计模型常用于地理场景,以确保有效捕捉空间变异。然而,空间模型和聚类算法可能很复杂且成本高昂。地理加权回归(GWR)就是其中一种算法,它是在地理文献中提出 的,用于使回归模型中的关系随空间变化。与传统线性回归模型不同,传统线性回归模型在空间上具有恒定的回归系数,而GWR在空间参考数据点处局部估计回归系数。采用GWR的动机在于,一组恒定的回归系数无法充分捕捉协变量与结果变量之间随空间变化的关系。GWR已广泛应用于多个领域,如生态学、林业、流行病学、神经学和天文学。虽然频率学派的GWR为我们提供了点估计和置信区间,但贝叶斯GWR通过纳入先验知识并为感兴趣的参数和预测提供概率分布,丰富了我们的理解。本文追求三个主要目标。首先,通过将贝叶斯地理加权回归(BGWR)与包括高斯混合模型和狄利克雷过程混合模型的后处理步骤相结合,引入协变量效应聚类。其次,本文研究了在贝叶斯框架下,特定协变量在一个区域很重要而在另一个区域不重要的情况。最后,它解决了现有BGWR中的计算挑战,从而改进了适用于大型空间数据集的马尔可夫链蒙特卡罗估计。使用模拟数据证明了所提出方法的有效性,并通过使用昆士兰儿童健康组织和澳大利亚早期发展普查提供的数据,在一项研究澳大利亚昆士兰州儿童发展领域的案例研究中进一步验证了该方法。