Rahardiantoro Septian, Sakamoto Wataru
Department of Human Ecology, Graduate School of Environmental and Life Science, Okayama University, Okayama, 700-8350 Japan.
Department of Statistics, Faculty of Mathematics and Natural Science, IPB University, Bogor, 16680 Indonesia.
Comput Stat. 2023 Apr 11:1-25. doi: 10.1007/s00180-023-01331-x.
This study addressed the issue of determining multiple potential clusters with regularization approaches for the purpose of spatio-temporal clustering. The generalized lasso framework has flexibility to incorporate adjacencies between objects in the penalty matrix and to detect multiple clusters. A generalized lasso model with two penalties is proposed, which can be separated into two generalized lasso models: trend filtering of temporal effect and fused lasso of spatial effect for each time point. To select the tuning parameters, the approximate leave-one-out cross-validation (ALOCV) and generalized cross-validation (GCV) are considered. A simulation study is conducted to evaluate the proposed method compared to other approaches in different problems and structures of multiple clusters. The generalized lasso with ALOCV and GCV provided smaller MSE in estimating the temporal and spatial effect compared to unpenalized method, ridge, lasso, and generalized ridge. In temporal effects detection, the generalized lasso with ALOCV and GCV provided relatively smaller and more stable MSE than other methods, for different structure of true risk values. In spatial effects detection, the generalized lasso with ALOCV provided higher index of edges detection accuracy. The simulation also suggested using a common tuning parameter over all time points in spatial clustering. Finally, the proposed method was applied to the weekly Covid-19 data in Japan form March 21, 2020, to September 11, 2021, along with the interpretation of dynamic behavior of multiple clusters.
本研究探讨了使用正则化方法确定多个潜在聚类以进行时空聚类的问题。广义套索框架具有在惩罚矩阵中纳入对象之间邻接关系并检测多个聚类的灵活性。提出了一种具有两种惩罚的广义套索模型,该模型可分为两个广义套索模型:时间效应的趋势滤波和每个时间点空间效应的融合套索。为了选择调优参数,考虑了近似留一法交叉验证(ALOCV)和广义交叉验证(GCV)。进行了一项模拟研究,以评估所提出的方法与其他方法在不同的多聚类问题和结构中的性能。与无惩罚方法、岭回归、套索回归和广义岭回归相比,具有ALOCV和GCV的广义套索在估计时间和空间效应时提供了更小的均方误差(MSE)。在时间效应检测中,对于不同结构的真实风险值,具有ALOCV和GCV的广义套索提供了比其他方法相对更小且更稳定的MSE。在空间效应检测中,具有ALOCV的广义套索提供了更高的边缘检测准确率指标。模拟还建议在空间聚类的所有时间点使用共同的调优参数。最后,将所提出的方法应用于2020年3月21日至2021年9月11日日本的每周新冠疫情数据,并对多个聚类的动态行为进行了解释。